Skip to content

Conversation

@chenghuichen
Copy link
Contributor

@chenghuichen chenghuichen commented Jan 19, 2026

Purpose

  • Align Python's Avro schema generation with Java implementation. Currently they are inconsist.
  • Full support for TIMESTAMP_LTZ in pypaimon.

Tests

Re-using run_mixed_tests.sh, changes inside:

  • JavaPyE2ETest.java
  • java_py_read_write_test.py

API and Format

Documentation

@chenghuichen chenghuichen changed the title avro bugfix Avro schema inconsistents between Java and PyPaimon Jan 19, 2026
@chenghuichen chenghuichen changed the title Avro schema inconsistents between Java and PyPaimon [WIP] [BugFix] Avro schema inconsistents between Java and PyPaimon Jan 19, 2026
@chenghuichen chenghuichen changed the title [WIP] [BugFix] Avro schema inconsistents between Java and PyPaimon [WIP] [BugFix] Avro schema inconsistents between Java and Python Jan 19, 2026
@chenghuichen chenghuichen changed the title [WIP] [BugFix] Avro schema inconsistents between Java and Python [BugFix] Avro schema inconsistents between Java and Python Jan 19, 2026
@JingsongLi
Copy link
Contributor

JingsongLi commented Jan 22, 2026

I checked the current implementation and it seems to be correct. Our Avro format has a very strange mapping relationship, the mapping on the Java side is reversed compared to standard...

@chenghuichen
Copy link
Contributor Author

I checked the current implementation and it seems to be correct. Our Avro format has a very strange mapping relationship, the mapping on the Java side is reversed compared to standard...

If that’s indeed the case, here’s my understanding:

  • I fully agree with your earlier point: "Whatever Java does, Python should align with. We need to ensure compatibility." This is a principle we must follow for ecosystem consistency.
  • We can address the non-standard mapping in a separate future PR, similar to how Flink handled it: introduce a new config like avro.timestamp_mapping.legacy (defaulting to true) to preserve backward compatibility, while allowing users to opt into the standard-compliant behavior when ready.
    What do you think?

@JingsongLi
Copy link
Contributor

I checked the current implementation and it seems to be correct. Our Avro format has a very strange mapping relationship, the mapping on the Java side is reversed compared to standard...

If that’s indeed the case, here’s my understanding:

  • I fully agree with your earlier point: "Whatever Java does, Python should align with. We need to ensure compatibility." This is a principle we must follow for ecosystem consistency.
  • We can address the non-standard mapping in a separate future PR, similar to how Flink handled it: introduce a new config like avro.timestamp_mapping.legacy (defaulting to true) to preserve backward compatibility, while allowing users to opt into the standard-compliant behavior when ready.
    What do you think?

I agree with you about "avro.timestamp_mapping.legacy", we can introduce it in next PR for both Java and Python.

@JingsongLi
Copy link
Contributor

@chenghuichen This PR will do more tests coverage?

@chenghuichen
Copy link
Contributor Author

chenghuichen commented Jan 22, 2026

@chenghuichen This PR will do more tests coverage?

No, the changes only affect Avro and Timestamp, which are already covered by tests. It’s ready to be merged if looks good to you.

@JingsongLi
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit 88332de into apache:master Jan 22, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants