feat: add N-D raster dimension query and manipulation functions#750
Draft
james-willis wants to merge 3 commits into
Draft
feat: add N-D raster dimension query and manipulation functions#750james-willis wants to merge 3 commits into
james-willis wants to merge 3 commits into
Conversation
9e9a550 to
5a4c8ac
Compare
6 tasks
05eb8b0 to
1dca04b
Compare
5 tasks
fb25043 to
dc44dea
Compare
james-willis
added a commit
to james-willis/sedona-db
that referenced
this pull request
May 6, 2026
`raster_ref_to_gdal_mem` previously returned a `Result<Dataset>` and
guarded against `BandRef::contiguous_data()` returning `Cow::Owned`
with a runtime tripwire ("Internal: contiguous_data must be borrowed
for is_2d bands; got owned"). The check was correct — handing GDAL a
pointer into a `Vec<u8>` that drops at the end of the iteration would
dangle — but it ties an internal invariant ("`is_2d` ⇒ Borrowed") to
incidental properties of today's reader. Any future copy path in the
reader (compression, BinaryView block-boundary stitching, alignment
fix-up, sliced/broadcast/transposed views from apache#813 / apache#750) would
detonate the tripwire on perfectly valid 2-D rasters.
Change: return `Result<(Dataset, Vec<Vec<u8>>)>`. On `Cow::Borrowed`
the GDAL band still points directly at the StructArray buffer
(zero-copy). On `Cow::Owned` we move the `Vec<u8>` out of the Cow
without copying — the reader's existing materialization is the only
allocation — and stash it in the returned vector. The caller (the
provider in `gdal_dataset_provider.rs`) parks it in a new
`RasterDataset::_owned_band_bytes` field that lives as long as the
MEM dataset that holds the pointers.
`raster_ref_to_gdal_empty` discards the always-empty vector.
3a9489d to
baaf403
Compare
paleolimbot
reviewed
May 12, 2026
Member
paleolimbot
left a comment
There was a problem hiding this comment.
I got partway through this before realizing it was the wrong PR 😬 (comments may be helpful anyway)
d630f21 to
718fbec
Compare
New dimension query functions for N-D rasters: - RS_NumDimensions(raster [, band]) → Int32 - RS_DimNames(raster [, band]) → List<Utf8> - RS_DimSize(raster, dim_name [, band]) → Int64 (null if dim missing) - RS_Shape(raster [, band]) → List<Int64> All accept an optional band index. When omitted, default to band 0 and verify all bands agree — error if bands have different dimensionality, prompting user to specify a band index. 19 new tests covering 2D/3D rasters, explicit band args, null handling, nonexistent dimensions, and mixed-dimensionality errors.
New N-D raster manipulation functions: - RS_Slice(raster, dim_name, index) — reduce a dimension by picking one index, removing it from the output - RS_SliceRange(raster, dim_name, start, end) — narrow a dimension to [start, end), keeping it with reduced size - RS_DimToBand(raster, dim_name) — promote a dimension into separate bands (e.g., 1 band [time=3,y,x] → 3 bands [y,x]) - RS_BandToDim(raster, dim_name) — collapse all bands into one band with a new dimension (inverse of DimToBand) All error on spatial dimension names (x_dim/y_dim). Phase 1 always materializes data (contiguous copies). 12 new tests including a DimToBand→BandToDim round-trip.
718fbec to
f8c9fd5
Compare
Add .qmd doc stubs for RS_NumDimensions, RS_DimNames, RS_DimSize, RS_Shape, RS_Slice, RS_SliceRange, RS_DimToBand, RS_BandToDim. Required by the docs-and-deploy CI check which validates every registered function has a documentation page.
f8c9fd5 to
a997d8c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 8 new
RS_*functions for querying and manipulating N-dimensional raster data. Independent of #813 — these functions produce only identity-view bands and use only the schema / reader / builder surface that landed in #749. Purely additive insedona-raster-functions— no schema or trait churn outside that crate.Conceptual model
The vocabulary follows xarray's labeled-N-D-array model with SQL-friendly verb naming. For a reader coming from xarray / NumPy / PostGIS:
RS_*functionRS_NumDimensions.ndim.ndimRS_DimNames.dimsRS_DimSize(name).sizes[name].shape[i]RS_Shape.shape.shapeRS_Slice(name, i).isel({name: i})(drops dim)arr[..., i, ...]RS_SliceRange(name, a, b).isel({name: slice(a, b)})(keeps dim)arr[..., a:b, ...]RS_DimToBand(name).expand_dims()RS_BandToDim(name).stack()along a new axisWe picked
RS_Slice/RS_SliceRangerather thanRS_Isel(xarray's name) because SQL function names read better when self-documenting in query logs:RS_Slice(raster, 'time', 5)overRS_Isel(raster, 'time', 5). The dim ↔ band conversions (RS_DimToBand/RS_BandToDim) are SedonaDB-specific verbs for crossing the "bands axis" vs "extra dimensions" boundary that's part of the raster schema.Dimension query functions (
rs_dimensions.rs)When the band argument is omitted, defaults to band 0 and verifies all bands agree — returns an error if bands have different dimensionality.
Slice functions (
rs_slice.rs)Dimension ↔ band functions (
rs_dim_band.rs)All slice/dim functions error on spatial dimensions (
x_dim/y_dim).None of these functions are GDAL-backed — none touch
sedona-raster-gdal/.Lazy implementations: deferred
RS_Slice,RS_SliceRange, andRS_DimToBandare all candidates for lazy view-composition — instead of walking the input bytes and writing a fresh band, the output would be a non-identity-view band that references the input's source bytes through a composed[ViewEntry]. That's tracked as a follow-up rather than included here for two reasons:The builder API needed for clean view composition doesn't exist yet. The conversion needs a builder method along the lines of
with_view(input: &dyn BandRef, view: &[ViewEntry], ...)— "create a new view into an existing raster." The currentstart_band_with_viewfrom feat(raster): view machinery for non-identity band views #813 only handles "construct a band with raw bytes and a view in one step"; it doesn't compose against an input band's existing view. Designing the unified API, deprecating the current surface, and migrating call sites is a focused follow-up — better as its own change than bundled with the function additions here.Memory trade-off depends on workload. Lazy slicing in this model stores the source bytes in the output band's
datacolumn, not the visible-region subset. For a small slice of a huge axis (one time index out of 10,000), the output carries the full 10,000-element payload + a view that says "only this element is visible." That's a win forslice → metadataand chained-slice workloads, a loss forslice → write to disk. Worth measuring on real workloads before flipping the default.RS_BandToDimis structurally eager regardless — it stacks multiple bands into one new dim, and PR-D's view model is single-source per band. Multi-source view machinery isn't planned.Test plan
sedona-raster-functions(143 existing + 31 new)cargo clippy --all-targets -- -D warningscleancargo fmt --all --checkcleanRS_DimToBandthenRS_BandToDimrecovers original data