Skip to content

fix(rust/sedona-query-planner): Ensure user provided RS_EnsureLoaded call preserves metadata#969

Merged
paleolimbot merged 12 commits into
apache:mainfrom
paleolimbot:ensure-loaded-metadata
Jun 18, 2026
Merged

fix(rust/sedona-query-planner): Ensure user provided RS_EnsureLoaded call preserves metadata#969
paleolimbot merged 12 commits into
apache:mainfrom
paleolimbot:ensure-loaded-metadata

Conversation

@paleolimbot

@paleolimbot paleolimbot commented Jun 17, 2026

Copy link
Copy Markdown
Member

We can't show off the fact that we can load pixels if we can't call it!

In #886 we added an optimizer rule that wraps pixel-requiring functions in RS_EnsureLoaded AND an extra function that works around async UDFs dropping metadata at the physical plan level. This doesn't handle normal calls to RS_EnsureLoaded, which are useful for testing the loaders or generally moving back into Python land.

This PR separates separates out the rule that wraps async udfs in the metadata-fixing function and tests it separately.

import sedona.db

sd = sedona.db.connect()
sd.options.interactive = True

path = "submodules/sedona-testing/data/raster/sentinel2.tif"
t = sd.sql("SELECT RS_FromPath($1) AS raster", params=(path, ))
t
# ┌───────────────────────────────────────────────────────────────┐
# │                             raster                            │
# │                             raster                            │
# ╞═══════════════════════════════════════════════════════════════╡
# │ [512x512/1] @ [330720 4138560 335840 4143680] / {...} <outdb> │
# └───────────────────────────────────────────────────────────────┘

t.select(t.raster.funcs.rs_ensureloaded())
# ┌───────────────────────────────────────────────────────┐
# │                rs_ensureloaded(raster)                │
# │                         raster                        │
# ╞═══════════════════════════════════════════════════════╡
# │ [512x512/1] @ [330720 4138560 335840 4143680] / {...} │
# └───────────────────────────────────────────────────────┘

Comment on lines +40 to +44
/// Logical optimizer rule that wraps async scalar UDF calls with
/// `sd_restore_metadata` to preserve field metadata stripped at the
/// physical layer (DF-22662).
#[derive(Default, Debug)]
pub struct WrapAsyncUdfRule;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the real addition...we wrap AsyncUdfs with a UDF that applies extra metadata to the output (based on expr.to_field(), which is correct for an async udf at the logical level.

Comment on lines +47 to +56
/// Registry/UDF name of the metadata restoration helper.
pub const RESTORE_METADATA_NAME: &str = "sd_restore_metadata";

/// Build the sync metadata-restore UDF with the given metadata map. The rule
/// constructs one per rewrite, embedding the metadata that should be stamped
/// onto the output field. `Stable` volatility so CSE can deduplicate identical
/// `sd_restore_metadata(...)` subtrees.
pub fn restore_metadata_udf(metadata: HashMap<String, String>) -> Arc<ScalarUDF> {
Arc::new(ScalarUDF::new_from_impl(RestoreMetadata::new(metadata)))
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was extracted out of the previous "reraster" (since it could in theory do things that aren't raster)

Comment on lines +44 to +58
def test_rs_ensureloaded(con, sedona_testing):
path = sedona_testing / "data/raster/sentinel2.tif"
t = con.sql("SELECT RS_FromPath($1) AS raster", params=(str(path),))
tab = t.select(raster=t.raster.funcs.rs_ensureloaded()).to_arrow_table()
r = tab["raster"][0].as_py()
assert r.height == 512
assert r.width == 512

assert len(r.bands) == 1
b = r.bands[0]
assert b.shape == (512, 512)
arr = b.to_numpy()
assert arr.shape == (512, 512)
assert arr.dtype == "uint16"
assert arr[0, 0] == 2324

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes raster metadata loss when users explicitly call RS_EnsureLoaded(...) by moving the DataFusion async-UDF metadata workaround into its own logical optimizer rule (WrapAsyncUdfRule) and adding targeted tests for both the rule and the Python-facing behavior.

Changes:

  • Simplifies RS_EnsureLoaded async invocation argument handling to preserve metadata for user-provided calls.
  • Introduces sd_restore_metadata + WrapAsyncUdfRule to wrap async scalar UDFs and re-stamp return_field metadata (DF-22662 workaround), and wires it into the optimizer pipeline.
  • Updates Python raster function tests and raster band buffer access to reflect/validate the restored metadata behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
rust/sedona-raster-functions/src/rs_ensure_loaded.rs Adjusts async invocation to use to_array(...) to better preserve metadata.
rust/sedona-query-planner/src/wrap_async_udf.rs New optimizer rule to wrap async scalar UDFs with sd_restore_metadata (plus unit tests).
rust/sedona-query-planner/src/restore_metadata.rs New sync identity UDF that stamps stored metadata onto its output field (plus unit tests).
rust/sedona-query-planner/src/optimizer.rs Registers WrapAsyncUdfRule alongside ensure-loaded rule ahead of CSE.
rust/sedona-query-planner/src/lib.rs Exposes new modules; removes old reraster workaround module.
rust/sedona-query-planner/src/ensure_loaded.rs Removes the old reraster wrapper; now emits rs_ensureloaded(arg) directly; updates tests and rule ordering assertions.
rust/sedona-query-planner/src/ensure_loaded_reraster.rs Deletes the previous DF-22662 workaround implementation.
python/sedonadb/tests/functions/test_raster_functions.py Refactors raster function tests and adds a regression test for rs_ensureloaded() behavior.
python/sedonadb/tests/functions/conftest.py Removes the now-unneeded raster_con fixture.
python/sedonadb/python/sedonadb/raster.py Allows band .data access after loading when outdb_uri exists but bytes are present.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rust/sedona-query-planner/src/ensure_loaded.rs Outdated
Comment thread rust/sedona-raster-functions/src/rs_ensure_loaded.rs
Comment thread rust/sedona-query-planner/src/lib.rs Outdated
@paleolimbot paleolimbot requested a review from Copilot June 17, 2026 15:37
@paleolimbot paleolimbot marked this pull request as ready for review June 17, 2026 15:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Comment thread rust/sedona-raster-functions/src/rs_ensure_loaded.rs Outdated
Comment thread rust/sedona-query-planner/src/restore_metadata.rs
Comment thread rust/sedona-query-planner/src/wrap_async_udf.rs
Comment thread python/sedonadb/tests/functions/test_raster_functions.py
"""The band data as a typed, shaped memoryview."""
if self.outdb_uri is not None:
source_data = self.source_data
if self.outdb_uri is not None and len(source_data) == 0:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also make sure that this is not a 0 byte raster. if hte product of the shape is 0 then it is always indb and [] is the byte array

/// `sd_restore_metadata` to preserve field metadata stripped at the
/// physical layer (DF-22662).
#[derive(Default, Debug)]
pub struct WrapAsyncUdfRule;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add an idempotency test for this rule? we want to make sure we dont rewrap

Comment on lines +113 to +116
let Ok((_, return_field)) = expr.to_field(schema) else {
// Can't determine return field; skip wrapping.
return Ok(Transformed::no(expr));
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we throw an internal error in this case? I dont think it should happen but if it does, its probably an internal sedona db bug

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually maybe this will tend to be caused by the user, like an ambiguous column name in the join result

@paleolimbot paleolimbot merged commit 3578b4d into apache:main Jun 18, 2026
17 checks passed
@paleolimbot paleolimbot deleted the ensure-loaded-metadata branch June 18, 2026 02:48
@paleolimbot paleolimbot added this to the 0.4.0 milestone Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants