Skip to content

fix(rust/sedona-query-planner): recognize wrapped RS_EnsureLoaded in idempotency guard#976

Merged
paleolimbot merged 1 commit into
apache:mainfrom
james-willis:jw/fix-ensureloaded-async-idempotency
Jun 18, 2026
Merged

fix(rust/sedona-query-planner): recognize wrapped RS_EnsureLoaded in idempotency guard#976
paleolimbot merged 1 commit into
apache:mainfrom
james-willis:jw/fix-ensureloaded-async-idempotency

Conversation

@james-willis

@james-willis james-willis commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Theres 2 bugs here:

  1. the new refactor from fix(rust/sedona-query-planner): Ensure user provided RS_EnsureLoaded call preserves metadata #969 causes the two rules to retrigger each other, stacking up 3 of each (datafusion runs the optimizer up to 3 times if things keep changing)
  2. Datafusion has some bug with nested Async UDFs. will dig into this tomorrow to create a bug report for them.

This solves 1 so 2 doesnt trigger.

🤖 AI-generated description

Summary

EnsureLoadedOptimizerRule wraps the raster arguments of needs_pixels UDFs (e.g. RS_Slice, RS_DimBand) with RS_EnsureLoaded, and WrapAsyncUdfRule re-stamps that async call as sd_restore_metadata(RS_EnsureLoaded(...)) to preserve field metadata. Both rules run in the same logical-optimizer fixpoint loop. The idempotency guard is_loaded_wrap only recognized a bare RS_EnsureLoaded, not one already wrapped in sd_restore_metadata, so on every subsequent pass it failed to see the argument as already-loaded and injected another wrapper. DataFusion runs the rule list repeatedly until the plan stops changing or it hits max_passes (default 3), so the steady state is three nested RS_EnsureLoaded calls.

This regressed in #969, which split metadata handling into the separate sd_restore_metadata wrapper but did not teach the guard to look through it. It went unnoticed because there is no end-to-end execution test for needs_pixels functions — the Rust tests invoke kernels directly and never run a real query plan, and #969's own idempotency test runs only the single rule, never with WrapAsyncUdfRule in between.

Why the nesting is fatal (a DataFusion limitation)

A redundant tower of an idempotent function would normally be harmless waste. It is fatal here because RS_EnsureLoaded is an async UDF, and DataFusion cannot evaluate an async UDF inside a synchronous expression. Instead it hoists each async call out into a dedicated AsyncFuncExec operator, computes it as a precomputed column (__async_fn_N), and rewrites the surrounding expression to reference that column.

That hoist does not correctly handle an async UDF nested as an argument to another async UDF call. AsyncMapper::find_references walks the expression pre-order and registers every nested async call as a separate async_expr (so it clearly intends to support nesting), but AsyncMapper::map_expr substitutes by whole-subtree structural equality and is applied bottom-up via transform_up. Substituting the innermost call mutates every enclosing subtree, so they no longer structurally match the async_expr registered for them, and only the innermost call is replaced with a column reference. The outer RS_EnsureLoaded calls remain literal inside the synchronous ProjectionExec and are invoked directly at runtime, failing with Internal error: async functions should not be called directly. The two halves of DataFusion's async extraction are thus mutually inconsistent under nesting — but this only surfaces when something generates nested async calls, which normal usage does not and the fixpoint regression above did.

Fix

Teach is_loaded_wrap to see through the sd_restore_metadata wrapper so an already-loaded argument is recognized and left alone. The rule then reaches a fixpoint after the first pass and produces a single RS_EnsureLoaded, which the async hoister handles correctly. Adds a regression test exercising the cross-rule fixpoint that the existing single-rule idempotency test could not reproduce.

…idempotency guard

The EnsureLoaded optimizer rule and the async-UDF metadata wrapper run in
the same fixpoint loop: the rule injects RS_EnsureLoaded around a raster
argument, then the wrapper re-stamps that async call as
sd_restore_metadata(RS_EnsureLoaded(...)). On the next pass the idempotency
guard only matched a bare RS_EnsureLoaded, so it re-injected another wrapper,
and so on each pass. The result is a tower of nested async calls where only
the innermost is extracted into AsyncFuncExec; the outer ones remain inline
and are invoked synchronously, failing with "async functions should not be
called directly".

Teach the guard to see through the sd_restore_metadata wrapper so an
already-loaded argument is left alone. Add a regression test covering the
cross-rule fixpoint, which the existing single-rule idempotency test could
not exercise.
@github-actions github-actions Bot requested a review from zhangfengcdt June 18, 2026 04:54
@james-willis james-willis marked this pull request as ready for review June 18, 2026 05:10

@paleolimbot paleolimbot left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬

Thank you for fixing!

@paleolimbot paleolimbot merged commit 2942c19 into apache:main Jun 18, 2026
18 checks passed
@paleolimbot paleolimbot added this to the 0.4.0 milestone Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants