fix: stack overflow when loading large equality deletes #1915

dojiong · 2025-12-09T09:40:01Z

Which issue does this PR close?

Closes #.

What changes are included in this PR?

A stack overflow occurs when processing data files containing a large number of equality deletes (e.g., > 6000 rows).
This happens because parse_equality_deletes_record_batch_stream previously constructed the final predicate by linearly calling .and() in a loop:

result_predicate = result_predicate.and(row_predicate.not());

This resulted in a deeply nested, left-skewed tree structure with a depth equal to the number of rows (N). When rewrite_not() (which uses a recursive visitor
pattern) was subsequently called on this structure, or when the structure was dropped, the call stack limit was exceeded.

Changes

Balanced Tree Construction: Refactored the predicate combination logic. Instead of linear accumulation, row predicates are collected and combined using a
pairwise combination approach to build a balanced tree. This reduces the tree depth from O(N) to O(log N).
Early Rewrite: rewrite_not() is now called immediately on each individual row predicate before they are combined. This ensures we are combining simplified
predicates and avoids traversing a massive unoptimized tree later.
Regression Test: Added test_large_equality_delete_batch_stack_overflow, which processes 20,000 equality delete rows to verify the fix.

Are these changes tested?

New regression test test_large_equality_delete_batch_stack_overflow passed.
All existing tests in arrow::caching_delete_file_loader passed.

liurenjie1024

Thanks @dojiong for this pr, LGTM! It would be nice to add a comment to explain why we need to do this optimization.

crates/iceberg/src/arrow/caching_delete_file_loader.rs

liurenjie1024

Thanks @dojiong for this fix.

dojiong force-pushed the main branch from 71a193c to 920eeae Compare December 10, 2025 02:11

liurenjie1024 reviewed Dec 10, 2025

View reviewed changes

crates/iceberg/src/arrow/caching_delete_file_loader.rs Show resolved Hide resolved

fix: stack overflow when loading large equality deletes

5fe7d71

dojiong force-pushed the main branch from 920eeae to 5fe7d71 Compare December 10, 2025 14:40

liurenjie1024 approved these changes Dec 11, 2025

View reviewed changes

Merge branch 'main' into main

02c5f60

liurenjie1024 merged commit 16906c1 into apache:main Dec 11, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stack overflow when loading large equality deletes #1915

fix: stack overflow when loading large equality deletes #1915

Uh oh!

dojiong commented Dec 9, 2025

Uh oh!

liurenjie1024 left a comment

Uh oh!

Uh oh!

liurenjie1024 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: stack overflow when loading large equality deletes #1915

fix: stack overflow when loading large equality deletes #1915

Uh oh!

Conversation

dojiong commented Dec 9, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants