Skip to content

[WIP] feat(raster): view machinery for non-identity band views#813

Draft
james-willis wants to merge 8 commits into
apache:mainfrom
james-willis:jw/nd-raster-views
Draft

[WIP] feat(raster): view machinery for non-identity band views#813
james-willis wants to merge 8 commits into
apache:mainfrom
james-willis:jw/nd-raster-views

Conversation

@james-willis
Copy link
Copy Markdown
Contributor

@james-willis james-willis commented May 5, 2026

Summary

Adds the view-machinery layer on top of the N-D raster type. Bands can now be constructed and read with non-identity view entries — slices, broadcasts, axis permutations, and reverse-iteration — and the byte-access surface (data / contiguous_data / nd_buffer) handles the strided materialization transparently. The slice/manipulation functions in #750 depend on this layer.

What's in this PR

Single commit, five files. Three layers:

Schema / trait surface (rust/sedona-raster/src/traits.rs)

  • validate_view() + 22 unit tests (length checks, axis range, negative step, broadcast / step=0, i64 overflow guards, etc.).
  • visible_shape_from_view() helper.
  • Both pub(crate) — validation runs implicitly inside start_band_with_view (writer) and RasterRef::band (reader), so external callers don't need to invoke them.

Builder (rust/sedona-raster/src/builder.rs)

  • start_band_with_view() API.
  • ~12 view-construction tests: slice, broadcast, transpose, negative-step, multi-band mixed views, Arrow IPC round-trip.

Reader (rust/sedona-raster/src/array.rs)

  • RasterRefImpl::band composes view → byte strides + byte offset using checked arithmetic; malformed views and arithmetic overflows route through sedona_internal_datafusion_err! so they surface with the standard "SedonaDB internal error" framing.
  • nd_buffer() returns the source buffer + shape/strides/offset (zero-allocation strided access).
  • contiguous_data() delegates to nd_buffer() for the OutDb check + identity-view fast path, then walks the strided copy via a shared materialize_strided helper.
  • BandRef::data() on BandRefImpl: identity views borrow from the Arrow column directly (zero-allocation); non-identity views lazy-materialize a row-major copy into an internal OnceCell on first access and reuse it on subsequent calls.
  • Reader tests for explicit views (negative steps, OOB axis, length mismatch, malformed view rejection).

GDAL bridge (rust/sedona-raster-gdal/src/{gdal_common,gdal_dataset_provider}.rs)

  • raster_ref_to_gdal_mem returns (Dataset, Vec<Vec<u8>>) so the Vec backing GDAL's pointer for view-materialized (Cow::Owned) bands outlives the dataset. RasterDataset carries the matching _owned_band_bytes field.

Behavior change vs main

  • A band with a non-null view row now decodes via the composition path (previously rejected as a read error).
  • BandRef::data() on a non-identity view returns the materialized row-major visible bytes via an internal OnceCell cache, rather than the raw source-column bytes. Repeat calls reuse the cache.

Test plan

  • cargo build -p sedona-schema -p sedona-raster -p sedona-raster-gdal -p sedona-raster-functions -p sedona-testing
  • cargo test -p sedona-raster — 113 tests pass.
  • cargo test -p sedona-raster-gdal — 53 tests pass.
  • cargo clippy --all-targets -- -D warnings
  • cargo fmt --all --check

@github-actions github-actions Bot requested a review from paleolimbot May 5, 2026 00:21
@james-willis james-willis marked this pull request as draft May 5, 2026 16:58
@james-willis james-willis force-pushed the jw/nd-raster-views branch 2 times, most recently from 349c957 to 91966ed Compare May 5, 2026 18:14
james-willis added a commit to james-willis/sedona-db that referenced this pull request May 6, 2026
`raster_ref_to_gdal_mem` previously returned a `Result<Dataset>` and
guarded against `BandRef::contiguous_data()` returning `Cow::Owned`
with a runtime tripwire ("Internal: contiguous_data must be borrowed
for is_2d bands; got owned"). The check was correct — handing GDAL a
pointer into a `Vec<u8>` that drops at the end of the iteration would
dangle — but it ties an internal invariant ("`is_2d` ⇒ Borrowed") to
incidental properties of today's reader. Any future copy path in the
reader (compression, BinaryView block-boundary stitching, alignment
fix-up, sliced/broadcast/transposed views from apache#813 / apache#750) would
detonate the tripwire on perfectly valid 2-D rasters.

Change: return `Result<(Dataset, Vec<Vec<u8>>)>`. On `Cow::Borrowed`
the GDAL band still points directly at the StructArray buffer
(zero-copy). On `Cow::Owned` we move the `Vec<u8>` out of the Cow
without copying — the reader's existing materialization is the only
allocation — and stash it in the returned vector. The caller (the
provider in `gdal_dataset_provider.rs`) parks it in a new
`RasterDataset::_owned_band_bytes` field that lives as long as the
MEM dataset that holds the pointers.

`raster_ref_to_gdal_empty` discards the always-empty vector.
@james-willis james-willis force-pushed the jw/nd-raster-views branch 7 times, most recently from 9e5ae44 to 7f2f79a Compare May 14, 2026 16:44
@james-willis james-willis marked this pull request as ready for review May 14, 2026 16:46
@james-willis james-willis marked this pull request as draft May 14, 2026 18:45
@james-willis james-willis marked this pull request as ready for review May 14, 2026 19:39
@zhangfengcdt zhangfengcdt self-requested a review May 14, 2026 20:34
@james-willis james-willis marked this pull request as draft May 15, 2026 07:58
@james-willis james-willis marked this pull request as ready for review May 15, 2026 08:49
@james-willis james-willis force-pushed the jw/nd-raster-views branch 2 times, most recently from d27d8d1 to e890f90 Compare May 15, 2026 16:50
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool!

I think I got there by the end in the inline comments, but I think a struct ViewEntries with associated functions in src/view_entries.rs where the non serialize-to-arrow and deserialize-from-arrow logic behind this lives will make this clearer.

The main issue I have with the implementation here is that it relies on hidden Vec<u8>s that aren't reusable. To manage memory properly we are going to have to register some scratch spaces with the session's memory pool and reuse them between iterations of a loop. Hiding them within various structures for convenience is going to make it tricky to implement a correct pattern going forward.

Comment thread rust/sedona-schema/src/raster.rs Outdated
Comment thread rust/sedona-raster/src/traits.rs Outdated
Comment on lines +696 to +697
/// with no allocation or copy. Non-identity views must materialize a
/// row-major copy on first byte access.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// with no allocation or copy. Non-identity views must materialize a
/// row-major copy on first byte access.
/// with no allocation or copy. Non-identity views must materialize a
/// row-major copy on first byte access when using these functions
/// (use `nd_buffer()` to guarantee that a copy is never produced).

Comment thread rust/sedona-raster/src/traits.rs Outdated
) -> Result<Vec<ViewEntry>, ArrowError> {
let mut out = Vec::with_capacity(next_view.len());
for (k, next) in next_view.iter().enumerate() {
if next.source_axis < 0 || (next.source_axis as usize) >= input_view.len() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should source_axis be a usize? I'm wary of using as here given that this is sensitive territory (will panic on out of bounds) although it is probably fine.

Comment thread rust/sedona-raster/src/traits.rs Outdated
Comment thread rust/sedona-raster/src/traits.rs Outdated
Comment on lines +686 to +688
pub(crate) fn visible_shape_from_view(view: &[ViewEntry]) -> Vec<u64> {
view.iter().map(|v| v.steps as u64).collect()
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be all i64s unless there is a compelling reason to use the unsigned version to minimize the number of times we have to as between signed and unsigned types.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned out some of the u64s. These are the ones I left in the raster ecosystem:

  1. BandRef::width() -> u64 / height() -> u64 (and the RasterMetadata shim equivalents) — 2D-compat accessors. Changing these is API-breaking across sedona-functions, sedona-raster-gdal, callers in tests, etc.
  2. BandRef::raw_source_shape() -> &[u64] and dim_size() -> Option — same reason; they sit on the public trait.
  3. source_shape, spatial_shape Arrow columns type → could be done as a follow up.

Comment thread rust/sedona-raster/src/builder.rs Outdated
Comment on lines +394 to +395
#[allow(clippy::too_many_arguments)]
pub(crate) fn start_band_with_view(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clippy has a good point here. Define struct StartBandWithViewArgs<'a> { ... } with all pub fields, which will force callers to write start_band_with_view(StartBandWithViewArgs { <named params> }).

You could also expose more than one function or have a BandBuilder of some kind to separate these things into more than one call.

Comment on lines +503 to +504
#[allow(clippy::too_many_arguments)]
pub fn with_view(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well (you can probably use a common Args struct)

Comment thread rust/sedona-raster/src/builder.rs Outdated
Comment on lines +1914 to +1916
let view = [
ViewEntry {
source_axis: 0,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A helper struct to manage Vec<ViewEntry> would make it more likely that other people will write reasonable tests. Or a macro (view_entries![0:4, 1:5]; use Python slice syntax since that's what you'd type in numpy to do this).

Comment thread rust/sedona-raster/src/builder.rs Outdated
Comment on lines +2648 to +2655
None,
&["y", "x"],
&[3, 3], // 3x3 source
&view,
BandDataType::Float32,
None,
None,
None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A BandBuilder may help you quite a bit here since there are a lot of None that a default constructor could fill in

Comment on lines +2833 to +2841
#[test]
fn test_nd_buffer_permutation_and_slice_combined() {
// 2D source [Y=4, X=3]. View permutes (visible order [X, Y]) and
// slices Y from 1, step 2, steps 2. Expected:
// visible_shape = [3, 2]
// byte_strides = [step_X * stride_X_src, step_Y * stride_Y_src]
// = [1 * 1, 2 * 3] = [1, 6]
// byte_offset = start_X * stride_X_src + start_Y * stride_Y_src
// = 0 * 1 + 1 * 3 = 3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kinds of tests you probably don't want in this file. This file is about serializing stuff into Arrow and it's tricky to see what's going on...I'm not sure what the perfect abstraction is here but it should probably live in src/views.rs with tests and some pub functions that can be benchmarked.

Proabably struct ViewEntries { inner: Vec<ViewEntry> } with associated methods would make this a bit cleaner than the &[ViewEntry] + free functions.

@james-willis
Copy link
Copy Markdown
Contributor Author

Somewhat offtopic but are you going to have the same concern with lazy outdb loading as well? (Outdb lazy loading + zarr loader is going to be higher on my priority list next week than this PR)

The main issue I have with the implementation here is that it relies on hidden Vecs that aren't reusable. To manage memory properly we are going to have to register some scratch spaces with the session's memory pool and reuse them between iterations of a loop. Hiding them within various structures for convenience is going to make it tricky to implement a correct pattern going forward.

@paleolimbot
Copy link
Copy Markdown
Member

Possibly? I know it seems like a lot of comments but what I'm getting at is:

let mut scratch = Vec::new();
for raster in rasters {
  let data_slice = raster.band(0)?.contiguous_data(&mut scratch)?;
  do_stuff(data_slice);
}

Or on the day we need to track memory:

let mut max_capacity = 0;
for raster in rasters {
  max_capacity = max_capacity.max(raster.band(0)?.contiguous_data_alloc_size());
}

if max_capacity > config_options.raster.max_scratch_alloc {
  // error
}

// If this ends up being a lot, frequently, we may also want to use a MemoryReservation to track it
// Probably this helps contiguous_data be faster since in theory there is only one heap allocation.
// You could probably use some of the unsafe modifiers to also ensure that there are no bounds
// checks in your materializer slowing things down
let mut scratch = Vec::with_capacity(max_capacity);
for raster in rasters {
  let data_slice = raster.band(0)?.contiguous_data(&mut scratch)?;
  do_stuff(data_slice);
}

I'm not sure exactly how loading works yet but you might want something similar (with the hiccup that you need some number of async workers doing IO so you also need the same number of scratch buffers).

(I have the same comment for pretty much everything that heap allocates in a loop, which is why, for example, our geometry writers write WKB directly into the Arrow output instead of returning Vec<u8>).

@james-willis james-willis changed the title feat(raster): view machinery for non-identity band views [WIP] feat(raster): view machinery for non-identity band views May 22, 2026
Adds the view-machinery layer that lets bands carry non-identity view
entries — slices, broadcasts, axis permutations — and decode them
correctly through the byte-access surface.

Schema and trait surface:
- validate_view() and visible_shape_from_view() helpers in traits.rs.
  Both pub(crate); validation runs implicitly inside start_band_with_view
  on the writer and RasterRef::band on the reader, so external callers
  don't need to invoke them directly.

Builder:
- start_band_with_view() writes a band with an explicit view column.

Reader:
- RasterRefImpl::band composes view -> byte strides + byte offset using
  checked arithmetic. Malformed views and arithmetic overflows route
  through sedona_internal_datafusion_err! so they surface with the
  standardised "SedonaDB internal error" framing.
- nd_buffer() returns the source buffer plus shape/strides/offset
  (zero-allocation strided access for callers that want to walk pixels
  themselves).
- contiguous_data() delegates to nd_buffer() for the OutDb check and
  identity-view fast path, then walks the strided copy via a shared
  materialize_strided helper.
- BandRef::data() on BandRefImpl: identity views borrow from the Arrow
  column directly; non-identity views lazy-materialize a row-major copy
  into an internal OnceCell on first access and reuse it on subsequent
  calls. Repeat data() calls on the same band don't re-walk strides.

GDAL bridge:
- raster_ref_to_gdal_mem now returns (Dataset, Vec<Vec<u8>>) so the
  Vec backing GDAL's pointer for view-materialized (Cow::Owned) bands
  outlives the dataset. RasterDataset carries the matching
  _owned_band_bytes field.

Tests (~50 new): validate_view rejection cases, builder-side view
constructions (slice, broadcast, transpose, negative-step, multi-band
mixed views, Arrow IPC round-trip), reader-side decoding of malformed
view rows, OnceCell materialization correctness.
Addresses the should-fix and nit items from the PR-D review pass:

* `BandRefImpl::byte_offset` is now `i64` throughout — the construction
  arithmetic, the field, and `materialize_strided`'s parameter all agree.
  The single `i64 -> u64` cast lives at the public `NdBuffer::offset`
  boundary, where it's provably safe (band() asserts `>= 0` before
  storing). Eliminates the previous unchecked round-trip cast inside
  `materialize_strided`.

* Factor identity-view detection out of the byte-stride composition loop
  into a `pub(crate) is_identity_view(view, source_shape)` helper next
  to `visible_shape_from_view`. Six new unit tests including the
  permuted-axes-with-identity-step negative case.

* New `check_view_buffer_bounds` precheck runs at `RasterRef::band()`
  construction for InDb bands: verifies that every byte the view can
  address lies within the data column AND that every stride × index
  product (and their accumulations) fits in i64. A corrupt Arrow row
  whose source_shape exceeds the bytes available now surfaces as a
  clean internal error at the read boundary. New test covers the
  corruption path; one stale builder test fixture (1024x1024 raster
  with 100 bytes of data) was tightened to 10x10 to satisfy the
  precheck.

* `materialize_strided` is now infallible. With `check_view_buffer_bounds`
  load-bearing at the read boundary, the strided walk's per-step bound
  checks and `checked_mul`/`checked_add` calls are provably redundant
  and have been removed. Return type drops to `Vec<u8>`; the OnceCell
  init in `BandRef::data()` no longer needs a `panic!`-on-Err arm.
  Defense-in-depth lives at the precheck only; if a future refactor
  removes it, Rust's bound-checked slice indexing inside
  `materialize_strided` still catches buffer overruns (just with a less
  informative panic message).

* Empty (non-null, zero-length) `view` rows are now read equivalently
  to NULL `view` rows (synthesize identity), for forward compatibility
  with producers that emit one form versus the other. Schema docstring
  updated; new reader test covers it.

* `materialize_strided`'s outer-index increment uses
  `for k in (0..inner).rev()` with `break` instead of the
  `while k > 0; k -= 1` pattern.

* GDAL bridge: explicit-index pattern replaces `last().unwrap()` when
  recording owned band bytes; a SAFETY comment documents the
  `Vec<Vec<u8>>` heap-pointer stability invariant that GDAL's raw
  pointer relies on. `raster_ref_to_gdal_empty` uses a hard `assert!`
  instead of `debug_assert!` so any future leak of GDAL-referenced
  bytes from an empty band list fails loudly in release builds too.

* `BandRef::nd_buffer` docstring now states that the returned
  `shape`/`strides`/`offset` are guaranteed in-bounds by the
  read-boundary precheck and validate_view, so stride-aware consumers
  don't need to re-check against `buffer.len()`.

* New 3-D cyclic-permutation test exercises the strided walk through
  every outer-axis combination plus a non-trivial inner step
  (`row_bytes_contiguous == false` path).
…g bands

Adds the canonical user-facing API for "create a new view into an
existing raster band." UDFs that slice / permute / broadcast can now
express their operation in terms of the input's *visible* axes without
knowing whether the input itself already carries a view — the builder
composes views internally.

Surface:

* `pub fn RasterBuilder::with_view(name, dim_names, input, view,
  nodata, outdb_uri, outdb_format)` — takes an `&dyn BandRef` and a
  view spec relative to the input's visible axes; writes an output
  band layered on the input's source bytes (InDb) or pointing at the
  same external resource (OutDb). The supplied view composes with the
  input's existing view via the new internal `compose_view` helper.

* `pub(crate) fn compose_view(input, next) -> Result<Vec<ViewEntry>, _>`
  in traits.rs. Walks `next` through `input` to produce a single
  source-space view. Checked arithmetic for stride × index products
  and start accumulation. Six unit tests cover identity pass-through,
  slice-of-slice collapse, source_axis preservation through
  permutation, step multiplication, source_axis-out-of-range rejection,
  and step-product overflow rejection.

* `RasterBuilder::start_band_with_view` is dropped from the public API
  (visibility downgraded to `pub(crate)`). External callers should use
  `with_view`; the helper stays reachable from internal tests in the
  same crate.

Storage semantics:

* InDb input → InDb output. Source bytes are copied verbatim into the
  output's data column. Arrow `BinaryView` buffer-sharing is a future
  optimisation.
* OutDb input → OutDb output. Data column stays empty; `outdb_uri` and
  `outdb_format` are inherited from the input (or overridden by the
  explicit arguments). Bytes are never fetched at construction — the
  output's composed view rides alongside the same external pointer,
  loading deferred to whoever reads the visible bytes.

Tests added in builder.rs:
* `with_view_over_identity_input_produces_expected_visible_bytes`
* `with_view_chained_composes_into_single_view` — proves slice-of-slice
  collapses into a single source-space view layer end-to-end.
* `with_view_on_outdb_input_produces_outdb_output_with_composed_view`
Per Dewey's review: every `source_axis`/`source_shape`/`visible_shape`
in the view helpers should be `i64` to match `ViewEntry`'s signed
fields and the stride arithmetic, instead of round-tripping through
`u64` and sprinkling `as` casts through helper bodies. Reworked to
make i64 the canonical view-machinery type — no more u64↔i64
conversions in `data()`, `nd_buffer()`, `contiguous_data()`, or the
band-construction storage step.

Public surface changes:

- `BandRef::shape() -> &[i64]` (was `&[u64]`). Production callers
  outside the raster crate's own tests don't exist; tests use array
  literals which infer the new element type. `width()` and `height()`
  stay `u64` (conventional non-negative scalar counts) and pick up a
  single `as u64` cast in the default `dim_size()` impl.
- `BandRef::raw_source_shape() -> &[u64]` unchanged — this is a
  direct slice into the Arrow `UInt64Array` column, changing the
  return type would force an alloc per call.
- `NdBuffer.shape: Vec<i64>`, `NdBuffer.offset: i64` — internal
  struct, no out-of-crate consumers.

Internals now `i64`:

- `validate_view(view, source_shape: &[i64])`, with an added
  `source_shape[sa] >= 0` check at the boundary.
- `is_identity_view(view, source_shape: &[i64])`.
- `visible_shape_from_view(view) -> Vec<i64>`.
- `BandRefImpl::visible_shape: Vec<i64>` (matches the trait return).
- `materialize_strided` and `check_view_buffer_bounds` take
  `visible_shape: &[i64]`.

Conversions concentrated at the two unavoidable boundaries — both
are Arrow column reads:

- `RasterRefImpl::band()` converts `raw_source_shape() -> &[u64]`
  into `Vec<i64>` once with a `try_from` overflow check.
- `RasterBuilder::start_band_with_view` does the same conversion at
  its `source_shape: &[u64]` entry point before calling
  `validate_view`.

Eliminates the source-stride `i64::try_from` cast that used to live
inside `band()` (line ~527 pre-refactor). Tests pass: sedona-raster
131, sedona-raster-functions 143, sedona-raster-gdal 53. Arrow column
storage and the public `width()`/`height()` API unchanged.
…ies.rs

Per Dewey's review: the view machinery should live behind a named
type with associated methods, not a `&[ViewEntry]` + free-function
soup spread across `traits.rs`, `builder.rs`, and `array.rs`.

New module `rust/sedona-raster/src/view_entries.rs`:

- `ViewEntry` struct (moved from traits.rs; re-exported from there
  so existing import paths keep working).
- `ViewEntries(Vec<ViewEntry>)` newtype with the full surface as
  methods: `validate`, `is_identity`, `visible_shape`, `compose`,
  plus `identity_for_shape` constructor and the usual `len`/
  `as_slice`/`iter`/`Index`/`IntoIterator` ergonomics.
- All view-machinery tests (validate, is_identity, compose,
  visible_shape) live in this file's `#[cfg(test)]` module.
- `view_entries![start;stop, ...]` macro for Python-slice-style test
  construction. `;` rather than `:` because `:` isn't valid Rust
  macro syntax for the `start:stop` form, but `start;stop` is a
  pragmatic stand-in that reads close enough at the call site.

Migrated call sites:

- `array.rs`: `RasterRefImpl::band` builds a `ViewEntries` (either
  via `identity_for_shape` for the NULL/empty-encoded identity case
  or via `ViewEntries::new` for the decoded-from-Arrow case) and
  uses `.validate(...)`, `.visible_shape()`, `.is_identity(...)`.
  `BandRefImpl::view_entries: ViewEntries` (was `Vec<ViewEntry>`).
- `builder.rs`: `start_band_with_view` validates via
  `ViewEntries::validate`; `with_view` composes via
  `ViewEntries::compose`.
- `traits.rs`: lost `validate_view`/`is_identity_view`/
  `visible_shape_from_view`/`compose_view` free functions and ~250
  lines of their tests. `ViewEntry` is re-exported from
  `view_entries`.

Tests pass: sedona-raster 134 (was 131; +3 from the new tests
exercising `identity_for_shape`, `visible_shape`, and the macro),
sedona-raster-functions 143, sedona-raster-gdal 53.
…ecomp

Three polish items on the view-machinery review:

1. Identity-view encoding: the `view` list column is now identity iff the
   row is NULL. Empty (zero-length, non-null) lists are rejected by the
   reader as malformed. The schema doc on `RasterSchema::view_type`
   documents the contract.

2. `start_band_with_view` now takes a `StartBandWithViewArgs` struct
   instead of eight positional arguments. Test call sites migrated to
   the struct-literal form.

3. `RasterRefImpl::band()` decomposed into named helpers — read source
   shape, resolve data type, read view entries, compose byte strides —
   so the orchestrating function reads top-down without inlining
   Arrow-buffer arithmetic and checked-stride composition together.
The view_entries![start:stop, ...] macro existed but no tests used it,
so the example had no readers and the syntax wasn't being exercised in
practice. Migrate every test site whose view-list happens to be in the
macro's expressible subset (sequential source_axis 0..n, step = 1, and
steps = stop - start). Mixed-permutation, broadcast (step = 0),
negative-step, and step != 1 sites stay on the struct-literal form, as
the macro intentionally doesn't try to express those.

Also: shift `axis += 1;` to before the push and start axis at -1 so the
final assignment is always read by the next push. This kills the
unused_assignments warning on single-entry macro uses without needing
an inner #[allow].
…ntics

After the ViewEntries extraction commit, every view-rule (validate /
compose / is_identity / visible_shape / broadcast / negative-step /
permutation) is exhaustively covered by pure-semantic tests in
view_entries.rs. The build+read integration tests in array.rs and
builder.rs had drifted into re-verifying those same rules through the
Arrow round-trip — useful when the rules were being designed, wasteful
now that the abstraction is settled.

Pruned 12 tests that re-asserted view-rule semantics through the
build+read pipeline. What remains is the set that uniquely tests
integration-layer concerns:

- Arrow encoding parity (NULL field, identity-via-explicit-view vs
  identity-via-NULL, IPC round-trip)
- Reader-side corruption guards (source-stride overflow, view step ×
  stride overflow, data column shorter than view, empty source_shape,
  empty non-null view list, unknown data_type discriminant)
- Cow::Borrowed vs Cow::Owned semantics on contiguous_data()
- materialize_strided byte-correctness across distinct paths (outer
  slice / strided inner fallback / negative step / broadcast /
  permutation / empty axis / float32 fast path)
- builder + reader validate-wiring smoke (1 each)
- with_view API (identity, chained, OutDb)

Specifically removed:
- builder.rs: test_view_slice_nd_buffer_and_contiguous_data,
  test_view_broadcast, test_view_permutation_transpose,
  test_view_empty_axis, test_view_validation_rejects_out_of_range_start,
  test_view_validation_rejects_duplicate_source_axis,
  test_contiguous_data_negative_step_full_reverse,
  test_nd_buffer_steps_one_view,
  test_nd_buffer_multidim_non_zero_starts
- array.rs: band_returns_none_when_view_has_negative_steps,
  band_returns_none_when_view_source_axis_out_of_range,
  band_returns_none_when_view_has_duplicate_source_axis

134 → 122 sedona-raster tests; all other crates unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants