-
Notifications
You must be signed in to change notification settings - Fork 113
Fix: incorrect child vector size in duckdb list array import #5740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
38989db to
6fe3109
Compare
|
decided to add specialization on the validity |
36af150 to
896dc20
Compare
Signed-off-by: Connor Tsui <[email protected]>
896dc20 to
85a75c3
Compare
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
vortex-duckdb/src/convert/vector.rs
Outdated
|
|
||
| let (offsets, sizes, child_max_length) = process_duckdb_lists(entries, &validity); | ||
| let child_data = | ||
| flat_vector_to_vortex(&mut vector.list_vector_get_child(), child_max_length)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this have to be the sum of the lengths not the max? You're exporting this many elements of children
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I guess the name is a bit confusing? This is the maximum over all ends of the list views into the child vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the correct name here is child_min_length which is the max over all view ends
Signed-off-by: Connor Tsui <[email protected]>
Fixes #5597
Since list views can be stored out of order and because there can be nulls at the end of the vector, we have to calculate the length of the child vector by looking at all views in the vector.
Note that this does not take validity into account, as I have no idea if DuckDB allows for complete garbage in their list views if they are null. At least in Arrow, that is not allowed.
Also adds 3 regression tests.