Conversation
This PR modifies all `INSERT` logic so virtual (a.k.a generated) MySQL columns are not attempted to insert into, which otherwise breaks the ferrying process. See also #338.
|
Hey @plisandro ! There was a bunch more things to tackle here, everything should be in place now, feel free to review and test, as well as sign the CLA mentioned in the checks for the contributions 👍 |
This is much appreciated, thank you! Been testing it today and your PR seems to work well. CLA is now signed as well 😄 |
|
Hey @plisandro I'm not super familiar with the CLA stuff, but the error says:
leave a comment and lets see, also I think you may have committed under your original account, so you might need to leave/sign under it as well 🤔 |
|
I have signed the CLA! |
Done, and apologies for the confusion - i'm in the process of merging accounts now, and this was actually the last contribution left with the old one 🤦 |
|
@plisandro no problem! |
ref: #400 & @proton-lisandro-pin
The back and forth is taking a bit of time, lets speed this up, I've cherry-picked your commits so the attribution is there, CLA was signed on original PR, if you could sign the one here as well it would be 👍
Handle MySQL Generated Columns (STORED and VIRTUAL) in Data Replication
This PR adds support for MySQL generated columns (both
VIRTUALandSTORED) to Ghostferry, enabling proper handling of computed columns during selective data replication.Problem Statement
MySQL 8.0.23 introduced significant changes to how generated columns are handled in binary log ROW events:
Without special handling, Ghostferry would fail or produce incorrect results when replicating tables with generated columns, as it would attempt to insert values into columns that cannot be modified or would have incorrect column positions.
Solution
This PR implements a comprehensive solution with four key components:
Row Expansion - Detects when MySQL omits virtual columns from binlog events and re-inserts nil sentinels to maintain consistent full-schema column indexing throughout the pipeline.
Insert Value Filtering - Filters out generated column values before constructing
INSERTstatements, allowing only modifiable columns to be inserted while using proper column metadata for value escaping.Unsigned Integer Normalization - Fixed the order of operations: row expansion happens before unsigned integer normalization, ensuring consistent full-schema column indexing throughout.
Verification with Generated Columns - Includes all columns (including generated) in fingerprint queries to detect divergence when computed values differ between source and target databases.
Changes
Core Implementation
INSERTevent handling to filter out generated columns and improved binlog event processing for MySQL 8.0.23+ compatibilityIsColumnGenerated()- Identifies virtual and stored columnsNonGeneratedColumnNames()- Returns only insertable columnsFilterGeneratedColumnsOnRowData()- Removes generated column values from rowsTest Coverage
VIRTUALandSTOREDcolumnsEdge Cases Handled
VIRTUALandSTOREDcolumns in same tableTesting
Related Issue
Closes #338