Skip to content

feat: add N-D raster dimension query and manipulation functions#750

Draft
james-willis wants to merge 3 commits into
apache:mainfrom
james-willis:jw/nd-raster-functions
Draft

feat: add N-D raster dimension query and manipulation functions#750
james-willis wants to merge 3 commits into
apache:mainfrom
james-willis:jw/nd-raster-functions

Conversation

@james-willis
Copy link
Copy Markdown
Contributor

@james-willis james-willis commented Apr 3, 2026

Summary

Adds 8 new RS_* functions for querying and manipulating N-dimensional raster data. Independent of #813 — these functions produce only identity-view bands and use only the schema / reader / builder surface that landed in #749. Purely additive in sedona-raster-functions — no schema or trait churn outside that crate.

Note: lazy implementations are deferred. Every slice/dim function in this PR eagerly materializes its output as a fresh identity-view band. The view machinery from #813 is intentionally unused here. Converting these to lazy view composition (so RS_Slice(RS_Slice(...)) fuses into a single composed view, etc.) is tracked as a follow-up — see the bottom of this description for the rationale and scope. RS_BandToDim will stay eager regardless; it requires multi-source view machinery that isn't on the roadmap.

Conceptual model

The vocabulary follows xarray's labeled-N-D-array model with SQL-friendly verb naming. For a reader coming from xarray / NumPy / PostGIS:

RS_* function xarray analog NumPy analog PostGIS analog
RS_NumDimensions .ndim .ndim — (PostGIS rasters are 2-D + bands)
RS_DimNames .dims
RS_DimSize(name) .sizes[name] .shape[i]
RS_Shape .shape .shape
RS_Slice(name, i) .isel({name: i}) (drops dim) arr[..., i, ...]
RS_SliceRange(name, a, b) .isel({name: slice(a, b)}) (keeps dim) arr[..., a:b, ...]
RS_DimToBand(name) partial inverse of .expand_dims()
RS_BandToDim(name) partial match to .stack() along a new axis

We picked RS_Slice / RS_SliceRange rather than RS_Isel (xarray's name) because SQL function names read better when self-documenting in query logs: RS_Slice(raster, 'time', 5) over RS_Isel(raster, 'time', 5). The dim ↔ band conversions (RS_DimToBand / RS_BandToDim) are SedonaDB-specific verbs for crossing the "bands axis" vs "extra dimensions" boundary that's part of the raster schema.

Dimension query functions (rs_dimensions.rs)

  • RS_NumDimensions(raster [, band]) → Int32 — number of dimensions
  • RS_DimNames(raster [, band]) → List<Utf8> — ordered dimension names
  • RS_DimSize(raster, dim_name [, band]) → Int64 — size of a named dimension (null if missing)
  • RS_Shape(raster [, band]) → List<Int64> — full shape array

When the band argument is omitted, defaults to band 0 and verifies all bands agree — returns an error if bands have different dimensionality.

Slice functions (rs_slice.rs)

  • RS_Slice(raster, dim_name, index) → Raster — reduce a dimension by picking one index
  • RS_SliceRange(raster, dim_name, start, end) → Raster — narrow a dimension to [start, end)

Dimension ↔ band functions (rs_dim_band.rs)

  • RS_DimToBand(raster, dim_name) → Raster — promote a dimension into separate bands
  • RS_BandToDim(raster, dim_name) → Raster — collapse bands into a new dimension

All slice/dim functions error on spatial dimensions (x_dim/y_dim).

None of these functions are GDAL-backed — none touch sedona-raster-gdal/.

Lazy implementations: deferred

RS_Slice, RS_SliceRange, and RS_DimToBand are all candidates for lazy view-composition — instead of walking the input bytes and writing a fresh band, the output would be a non-identity-view band that references the input's source bytes through a composed [ViewEntry]. That's tracked as a follow-up rather than included here for two reasons:

  1. The builder API needed for clean view composition doesn't exist yet. The conversion needs a builder method along the lines of with_view(input: &dyn BandRef, view: &[ViewEntry], ...) — "create a new view into an existing raster." The current start_band_with_view from feat(raster): view machinery for non-identity band views #813 only handles "construct a band with raw bytes and a view in one step"; it doesn't compose against an input band's existing view. Designing the unified API, deprecating the current surface, and migrating call sites is a focused follow-up — better as its own change than bundled with the function additions here.

  2. Memory trade-off depends on workload. Lazy slicing in this model stores the source bytes in the output band's data column, not the visible-region subset. For a small slice of a huge axis (one time index out of 10,000), the output carries the full 10,000-element payload + a view that says "only this element is visible." That's a win for slice → metadata and chained-slice workloads, a loss for slice → write to disk. Worth measuring on real workloads before flipping the default.

RS_BandToDim is structurally eager regardless — it stacks multiple bands into one new dim, and PR-D's view model is single-source per band. Multi-source view machinery isn't planned.

Test plan

  • 31 new tests across 3 files (19 dimension queries + 12 slice/dim-band)
  • All 174 tests pass in sedona-raster-functions (143 existing + 31 new)
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --all --check clean
  • Round-trip test: RS_DimToBand then RS_BandToDim recovers original data

@github-actions github-actions Bot requested a review from paleolimbot April 3, 2026 20:03
@james-willis james-willis force-pushed the jw/nd-raster-functions branch from 9e9a550 to 5a4c8ac Compare April 6, 2026 17:01
@james-willis james-willis force-pushed the jw/nd-raster-functions branch 3 times, most recently from 05eb8b0 to 1dca04b Compare May 4, 2026 23:31
@james-willis james-willis force-pushed the jw/nd-raster-functions branch 5 times, most recently from fb25043 to dc44dea Compare May 5, 2026 18:14
james-willis added a commit to james-willis/sedona-db that referenced this pull request May 6, 2026
`raster_ref_to_gdal_mem` previously returned a `Result<Dataset>` and
guarded against `BandRef::contiguous_data()` returning `Cow::Owned`
with a runtime tripwire ("Internal: contiguous_data must be borrowed
for is_2d bands; got owned"). The check was correct — handing GDAL a
pointer into a `Vec<u8>` that drops at the end of the iteration would
dangle — but it ties an internal invariant ("`is_2d` ⇒ Borrowed") to
incidental properties of today's reader. Any future copy path in the
reader (compression, BinaryView block-boundary stitching, alignment
fix-up, sliced/broadcast/transposed views from apache#813 / apache#750) would
detonate the tripwire on perfectly valid 2-D rasters.

Change: return `Result<(Dataset, Vec<Vec<u8>>)>`. On `Cow::Borrowed`
the GDAL band still points directly at the StructArray buffer
(zero-copy). On `Cow::Owned` we move the `Vec<u8>` out of the Cow
without copying — the reader's existing materialization is the only
allocation — and stash it in the returned vector. The caller (the
provider in `gdal_dataset_provider.rs`) parks it in a new
`RasterDataset::_owned_band_bytes` field that lives as long as the
MEM dataset that holds the pointers.

`raster_ref_to_gdal_empty` discards the always-empty vector.
@james-willis james-willis force-pushed the jw/nd-raster-functions branch 2 times, most recently from 3a9489d to baaf403 Compare May 11, 2026 17:13
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got partway through this before realizing it was the wrong PR 😬 (comments may be helpful anyway)

Comment thread rust/sedona-raster-functions/src/rs_band_accessors.rs Outdated
Comment thread rust/sedona-raster-functions/src/rs_convexhull.rs Outdated
@james-willis james-willis force-pushed the jw/nd-raster-functions branch 7 times, most recently from d630f21 to 718fbec Compare May 14, 2026 23:53
New dimension query functions for N-D rasters:

- RS_NumDimensions(raster [, band]) → Int32
- RS_DimNames(raster [, band]) → List<Utf8>
- RS_DimSize(raster, dim_name [, band]) → Int64 (null if dim missing)
- RS_Shape(raster [, band]) → List<Int64>

All accept an optional band index. When omitted, default to band 0
and verify all bands agree — error if bands have different
dimensionality, prompting user to specify a band index.

19 new tests covering 2D/3D rasters, explicit band args, null
handling, nonexistent dimensions, and mixed-dimensionality errors.
New N-D raster manipulation functions:

- RS_Slice(raster, dim_name, index) — reduce a dimension by picking
  one index, removing it from the output
- RS_SliceRange(raster, dim_name, start, end) — narrow a dimension
  to [start, end), keeping it with reduced size
- RS_DimToBand(raster, dim_name) — promote a dimension into separate
  bands (e.g., 1 band [time=3,y,x] → 3 bands [y,x])
- RS_BandToDim(raster, dim_name) — collapse all bands into one band
  with a new dimension (inverse of DimToBand)

All error on spatial dimension names (x_dim/y_dim). Phase 1 always
materializes data (contiguous copies). 12 new tests including
a DimToBand→BandToDim round-trip.
@james-willis james-willis force-pushed the jw/nd-raster-functions branch from 718fbec to f8c9fd5 Compare May 15, 2026 05:21
Add .qmd doc stubs for RS_NumDimensions, RS_DimNames, RS_DimSize,
RS_Shape, RS_Slice, RS_SliceRange, RS_DimToBand, RS_BandToDim.
Required by the docs-and-deploy CI check which validates every
registered function has a documentation page.
@james-willis james-willis force-pushed the jw/nd-raster-functions branch from f8c9fd5 to a997d8c Compare May 15, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants