Skip to content

fix: surface BufferExec input panics instead of silently truncating output#23243

Open
Tristan1900 wants to merge 2 commits into
apache:mainfrom
Tristan1900:bufferexec-propagate-input-panic
Open

fix: surface BufferExec input panics instead of silently truncating output#23243
Tristan1900 wants to merge 2 commits into
apache:mainfrom
Tristan1900:bufferexec-propagate-input-panic

Conversation

@Tristan1900

Copy link
Copy Markdown

Which issue does this close?

Closes #23242.

Rationale for this change

When the input to a BufferExec panics, the panic unwinds the background producer task (MemoryBufferedStream), tokio catches it at the task boundary, and the dropped sender looks like a clean end-of-stream to the consumer. A panic isn't a Result::Err, so it also skips the batch_tx.send(Err(..)) path — the partition gets silently truncated and the query returns partial results instead of failing. More detail in #23242.

What changes are included in this PR?

Catch the panic at the input poll inside MemoryBufferedStream and forward it as a DataFusionError over the channel the consumer already drains, so it propagates instead of being swallowed.

I went with surfacing it as an error rather than re-raising via join_unwind (the way RecordBatchReceiverStream does), since the consumer already handles Some(Err(..)) and this fails only the query. Happy to switch to a re-raise if that's preferred for consistency.

Are these changes tested?

Yes — added panic_in_input_is_propagated, which feeds a stream that panics partway through and asserts the buffered stream yields an error instead of finishing cleanly.

Are there any user-facing changes?

A query whose input panics under a BufferExec now fails with an error instead of silently returning truncated results. No API changes.

MemoryBufferedStream runs the input on a spawned producer task that feeds
a channel. A panic while polling the input was caught by the tokio task
harness, which dropped the sender; the consumer then read the closed
channel as a clean EOF and silently truncated the rest of the partition's
output, so the query completes with partial results instead of failing.

Wrap the input poll in catch_unwind and forward the panic as a
DataFusionError over the existing error channel, so it propagates and
fails the stream instead of being swallowed.
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Jun 29, 2026
@Tristan1900 Tristan1900 marked this pull request as ready for review June 29, 2026 15:34

@gabotechs gabotechs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checked that the test indeed reproduces the issue in main. Good catch @Tristan1900!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BufferExec silently truncates output when its input panics

2 participants