fix(typed-store): make iterator bounds consistent and add prefix scans by chriscandless · Pull Request #11865 · iotaledger/iota

chriscandless · 2026-06-11T15:55:21Z

Description of change

Make the typed-store iterator wrapper produce consistent key ranges in both directions and remove the need for hand-built maximum keys in composite-key scans.

safe_iter_with_bounds(lo, hi) and reversed_safe_iter_with_bounds(lo, hi) computed their raw rocksdb bounds independently: the forward iterator covered [lo, hi) while the reversed one covered [lo, hi], so reversing a scan did not yield the same key set. Separately, scanning "all entries under a leading key" of a composite key required constructing artificial maximum keys (ObjectKey::max_for_id, usize::MAX tuple tails, etc.).

Those issues are solved with two new sets of methods:

Added safe_range_iter_reversed - reversed version of pre-existing safe_range_iter. It is a better version of reversed_safe_iter_with_bounds, because it explicitly defines type of bounds (included, excluded). Also safe_range_iter and safe_range_iter_reversed are now using same underlying logic for bound calculation, so it is impossible that they ever become asymmetric.
Added safe_iter_with_prefix / safe_iter_with_prefix_reversed - enable clear iteration by prefix, which is what magical MAX values were mostly used for. This removes most current and future uses of MAX values. A safe_iter_with_prefix_from variant covers cursor-paginated prefix scans (start from a cursor, no MAX upper bound).

The change is split into five self-contained commits (the latter four are independently revertable):

fix(typed-store): unify iterator bound handling and add prefix scans — route all bound computation through one range engine (iterator_bounds_with_range) and add safe_range_iter_reversed, which by construction yields exactly the keys of safe_range_iter in descending order. SafeRevIter now takes an exclusive upper boundary and steps off the boundary key when the initial seek lands on it, so exclusive upper bounds work in reverse as well. Add safe_iter_with_prefix / safe_iter_with_prefix_reversed, which derive the upper bound by incrementing the serialized prefix bytes, and migrate AuthorityStore::get_latest_marker as the first consumer. The Option-based wrappers keep their existing documented semantics as thin presets over the range engine.
refactor(typed-store): remove reversed_safe_iter_with_bounds in favor of safe_range_iter_reversed — the Option-based reverse iterator hard-coded inclusive bounds, invisible at the call site. Migrate all callers to the range form, where inclusivity is spelled out by the range syntax, and drop the wrapper together with its vestigial Result. The translation is semantics-preserving:

old new

(None, None) ..

(None, Some(hi)) ..=hi

(Some(lo), Some(hi)) lo..=hi

Note for upstream ports: Sui still has reversed_safe_iter_with_bounds; ported code touching these call sites should apply the mapping above.
refactor(iota-core): use prefix iterators instead of artificial MAX keys — replace the pure prefix scans (get_events_by_events_digest, have_deleted_owned_object_at_version_or_after, get_latest_live_version_for_object_id, get_latest_object_ref_or_tombstone, get_latest_object_or_tombstone, try_get_object, and the #[cfg(msim)] count_object_versions) with prefix iterators. Sites where a bound carries a real value (cursors, version cutoffs, range_iter_live_object_set spanning multiple ids) keep the explicit range form. Two behavior deltas, both safe: (a) a prefix scan also tightens the lower bound (from unbounded / min_for_id to the prefix start) — safe because every migrated reader re-checks the leading key (epoch / object id) of the first row, so a narrower lower bound cannot change the result; (b) the replaced ..=max_for_id upper bounds capped the version component at MAX_VALID_EXCL while a prefix scan covers the full u64 space — equivalent because sentinel versions (CANCELLED_READ, CONGESTED_*, MAX_VALID_EXCL) are scheduling-time artifacts, never persisted as object or marker table keys (a write-path invariant; verified by inspecting the write paths, not unit-tested, since a read-side test could only re-demonstrate the inclusion rather than prove the invariant).
refactor(typed-store,iota-core): add safe_iter_with_prefix_from and drop MAX bounds in cursor scans — add safe_iter_with_prefix_from(&prefix, &cursor) (the rocksdb prefix_iterator_from idiom: lower bound prefix ++ cursor, upper bound derived by incrementing the serialized prefix bytes). cursor is the remainder of the key after the prefix, not a full key, so the prefix is never duplicated and the lower bound is a prefix of the key by construction. Migrates the cursor-paginated prefix scans (get_dynamic_fields_iterator, dynamic_field_iter, package_versions_iter) off their (prefix, MAX) upper bounds; those callers keep their .skip(cursor.is_some()) (the cursor stays the inclusive lower bound). get_newer_object_keys is migrated too but has no cursor — its lower bound is object.1.next() (version + 1, an exclusive-by-construction start) and it uses no .skip. The prefix scan covers the whole prefix, including a key sitting exactly at the type's maximum tail that the old exclusive (prefix, MAX) bound dropped — never a real version/marker, a corrected edge for dynamic-field ids, equivalent for all real data. Reversed-pagination scans and the hash-bounded owner_iter keep explicit ranges — their bounds are not pure prefixes.
fix(typed-store): surface deserialization errors in SafeIter instead of ending iteration — SafeIter::next mapped a failed key/value deserialization to None via .ok(), indistinguishable from end-of-data, so one undeserializable row silently truncated the rest of the scan (the item type is Result<(K, V), _> precisely so the error can be reported; upstream returns it, the IOTA port had regressed to swallowing it with no test on the path). Now returns Some(Err(TypedStoreError::Serialization(..))), matching how the in-memory backend (memstore) and DBMap::get/multi_get already handle deserialization failures, with a regression test that reopens a column family with a wider value type and asserts iteration yields Some(Err(..)). Found in review of this PR.

The forward safe_iter_with_bounds is intentionally kept: its [lo, hi) contract matches the conventional half-open semantics, it is part of the Map trait shared with upstream, and it is now implemented as a thin preset over safe_range_iter with the contract documented on the trait.

Intentional runtime-behavior changes (no consumer relies on the old behavior):

Prefix scans include an entry at the type's maximum tail that the old exclusive (prefix, MAX) bound excluded (see commits 3-4); only matters for a key exactly at that tail, which the migrated tables never hold.
SafeIter::next now surfaces a deserialization failure as Some(Err(..)) instead of mapping it to None (silently truncating the scan); every iterator consumer is affected — ?/collect propagate the error, .next()-style readers return it. No consumer was relying on silent skip-on-bad-row.

Edge-case note (issue point 2): the inclusive upper-bound arm of iterator_bounds_with_range returned None (unbounded) when the bound serialized to all-0xFF. That is fine for fixed-width keys, but a longer key extending the maximum sorts after it and was wrongly included in a ..=max range. This PR returns ser(max) ++ [0] instead (the first key past the maximum), mirroring the excluded-lower arm, with a regression test. On top of that it adds reverse-direction and prefix regression coverage (forward/reverse symmetry across all bound kinds, prefix scans at the 0xFF serialization limit, boundary keys present in the DB).

Remaining `MAX` usages (deliberately not removed)

A few iterator bounds still build MAX/MIN keys by hand. They are left as-is because they are not pure forward prefix scans, so the new prefix APIs do not fit them cleanly; each would be a separate, higher-risk change:

Reversed pagination without a prefix (get_transactions reversed over transaction_order, keyed by a bare TxSequenceNumber): ..=cursor.unwrap_or(MAX) — there is no composite prefix here, MAX just means "no cursor → start from the end of the table". Removing it means branching on cursor (..=c vs ..), pure cosmetics over a non-fragile sentinel.
Reversed prefix pagination (get_transactions_from_index, get_event_from_index, events_by_transaction, all_events): these scan a leading key downwards from a cursor. They would fit a reversed safe_iter_with_prefix_from, but that is a new method with a different shape (in reverse the cursor is the upper bound and the prefix is the lower), and these sites currently stop via a take_while/predicate rather than a DB-level lower bound — switching the cutoff mechanism is a behavioral change that warrants its own PR with tests.
owner_iter (gRPC): the OwnerIndexKey prefix length is filter-dependent (owner, or (owner, id_hash), or (owner, id_hash, params_hash)), and the MAX values span the remaining unfixed fields. Expressible via prefix scans, but only by reworking owner_bounds per filter branch — not a mechanical swap.
range_iter_live_object_set: min_for_id(id_A)..=max_for_id(id_B) spans a range of different object ids (table sharding for background scans). The heads differ, so there is no common prefix; the alternative ..(next(id_B), 0) just moves the synthetic key elsewhere.
Range delete / compaction (schedule_delete_range for events, compact_range for objects) and config sentinels (num_epochs_to_retain == u64::MAX meaning "retain forever"): these are not iterators at all — a "delete/compact by prefix" helper would be a separate API surface, and the config value is not a key.

Net: this PR removes the MAX keys that were standing in for a missing forward prefix API. The reversed-prefix cases are the natural next step (a reversed prefix_from) and are left for a follow-up.

Links to any relevant issues

fixes #10929

How the change has been tested

Basic tests (linting, compilation, formatting, unit/integration tests)
Patch-specific tests (correctness, functionality coverage)
I have added tests that prove my fix is effective or that my feature works
I have checked that new and existing unit tests pass locally with my changes

cargo check / cargo clippy --all-targets --all-features -- -D warnings are green for typed-store, iota-core, iota-transactional-test-runner, and starfish-core; the typed-store unit suite passes (48/48, including the new symmetry/prefix regression tests). Each commit builds and passes the typed-store suite on its own. Note: running the typed-store suite with parallel cargo test hits a pre-existing Prometheus AlreadyReg registry race (also fails on develop); the suite is green under nextest (process per test) and with --test-threads=1.

chriscandless · 2026-06-15T12:53:25Z

I found important bug - I've accidentally changed values of cursors from "dynamic remainder" to "entire key". I now fixing it.

semenov-vladyslav · 2026-06-16T13:57:27Z

+        match &self.db.storage {
+            Storage::Rocks(db) => {
+                let readopts =
+                    rocks_util::apply_range_bounds(self.opts.readopts(), lower_bound, None);


I don't quite understand why not set the upper bound here too (it is set in iter_forward_raw)? That way rocksdb would know that the range is bounded from both sides.

Both ways are equivalent from the point of view of logic:

current variant uses seek_for_prev + prev` to land on the first included element from the top

the same effect can be achieved by using readopts (iterate_upper_bound) + seek_to_last

There are two reasons it is done the current way:

semantical reason: readops is a regular way to achieve getting "invalid" element when reaching that position. There is no need in setting it in reverse func if we only descend from it.

historical reason: iterate_upper_bound is designed for forward iteration, and its correctness for reverse seek (seek_for_prev) wasn't guaranteed - the seek could still land on a key at/above the bound. The original reverse iterator avoided relying on it too.

Oh, I see this ticket was even closed as not planned so maybe that "historical" bug even still there and actually not "historical" at all lol: facebook/rocksdb#9904)

I see that forward and reverse iterators have slightly different initialization ways, and None for upper bound works here. Thought it would just be nicer and symmetric with upper bound set to the provided value, especially that we already have one.

If we add it there just for symmetric logic, it may confuse someone, because the upper bound is already implemented using seek_for_prev. So additionally setting the upper bound via readopts may look strange and confusing. To make it less confusing we'd have to remove the seek_for_prevlogic - but its necessity was explained in the previous comment.

semenov-vladyslav · 2026-06-16T14:03:03Z

Suggested change

Let's remove this check. In all the code paths is_max is being checked already, doing it here is useless.

This one is tricky.

In Rust when making addition it is expected to specify in the name of the function what would happen in case of overflow: saturating_add (checks for max/min), wrapping_add (MAX goes to MIN, kind-a C++), checked_add / overflowing_add (return Option/Result/bool).
Such a plain name as big_endian_add_one does not specify anything, which can be treated as "caller checks for max". Practically this means "undefined behaviour in case of overflow", and that is kind-a not Rust-style. So we need to do at least anything about it.

In theory we can super easily check in the loop that max value was reached and return error. But that is waaay too intrusive for this code - nobody actually cares about it, everyone already checks for is_max.

So what I did: I put debug_assert!(!is_max(v)) at the start. And yeah, I know that is ineffective - it can be done more effectively just my modifying the addition loop. But I wanted to exclude that code in prod, so I decided to go with that ineffective double-iteration over value to be able to use lean debug_assert.

debug_assert! is one way to do that. Another way would be to put unreachable! after the loop and instead of breaking the loop in the else branch -- just return;.

You know what? That is actually more justified than what I did. My code has a chance to return wrong key in prod, which is not good. Panicking is to be on a safer side, which is good, especially considering it is super easy to do here. So switched to unreachable!

semenov-vladyslav · 2026-06-16T14:48:20Z

+                iter.db_iter.seek_for_prev(&bound);
+                // The bound is exclusive: move to the previous key if the
+                // excluded key actually exists in the db.
+                if iter.db_iter.valid() && iter.db_iter.key() == Some(bound.as_slice()) {
+                    iter.db_iter.prev();
+                }


Shouldn't seek_for_prev already return an entry with the key strictly less that bound?

Nah.
The name is really confusing, I also was googling it. seek_for_prev may land on a given target if that element is present. So it's like "inclusive upper bound". That's the reason why that conditional prev is there.

(see the doc - it has this example with Prev: https://github.com/facebook/rocksdb/wiki/SeekForPrev)

safe_iter_with_bounds and reversed_safe_iter_with_bounds computed their raw rocksdb bounds independently, so the same logical bounds covered different key sets in the two directions ([lo, hi) forward vs [lo, hi] reversed). Route all bound computation through one range engine (iterator_bounds_with_range) and add safe_range_iter_reversed, which by construction yields exactly the keys of safe_range_iter in descending order; the Option-based wrappers keep their documented semantics as thin presets on top of it. SafeRevIter now takes an exclusive upper boundary and steps off the boundary key when the initial seek lands on it, making exclusive upper bounds work in reverse as well. Composite-key scans had to construct artificial maximum keys (e.g. ObjectKey::max_for_id) to bound the iteration. Add safe_iter_with_prefix and safe_iter_with_prefix_reversed, which derive the upper bound by incrementing the serialized prefix bytes instead, and migrate AuthorityStore::get_latest_marker to the reversed prefix scan as the first consumer. Part of #10929. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… of safe_range_iter_reversed The Option-based reverse iterator hard-coded inclusive bounds, which is non-standard (the forward APIs treat the upper bound as exclusive) and invisible at the call site. Migrate all callers to safe_range_iter_reversed, where the inclusivity is spelled out by the range syntax itself, and drop the wrapper: - (None, None) -> .. - (None, Some(hi)) -> ..=hi - (Some(lo), Some(hi)) -> lo..=hi The translation is semantics-preserving: the range forms construct exactly the same Bound::Included pairs the wrapper produced internally. This also drops the wrapper's vestigial Result, removing ?/expect noise at the call sites. Note for upstream ports: Sui still has reversed_safe_iter_with_bounds; ported code touching these call sites should apply the mapping above. Part of #10929. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Several scans over composite-key tables bounded the iteration with hand-built maximum keys (ObjectKey::max_for_id, usize::MAX tuples, SequenceNumber::MAX_VALID_EXCL + ObjectDigest::MAX) to express "all entries under a leading key". Replace the pure prefix scans with safe_iter_with_prefix / safe_iter_with_prefix_reversed, which derive the upper bound from the serialized prefix bytes instead: - get_events_by_events_digest: all events of one digest - have_deleted_owned_object_at_version_or_after: latest marker under (epoch, object id) - get_latest_live_version_for_object_id: latest live-marker ObjectRef - get_latest_object_ref_or_tombstone / get_latest_object_or_tombstone / try_get_object: latest object version - count_object_versions (msim-only): all versions of an object id Sites where a bound carries a real value (cursors, version cutoffs, range_iter_live_object_set spanning multiple ids) keep the explicit range form. Besides the version-tail change below, a prefix scan also tightens the lower bound (from unbounded / min_for_id to the prefix start). That is safe: every migrated reader already re-checks the leading key (epoch / object id) of the first row it reads, so a narrower lower bound cannot change the result. The replaced ..=max_for_id bounds capped the version component at MAX_VALID_EXCL while a prefix scan covers the full u64 space; this is equivalent because sentinel versions (CANCELLED_READ, CONGESTED_*) are scheduling-time artifacts and are never persisted as object or marker table keys. Part of #10929. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…rop MAX bounds in cursor scans Paginated prefix scans bounded the iteration with an artificial maximum key for the upper end (`(prefix, MAX)`) while the lower end carried a real cursor. Add `safe_iter_with_prefix_from(&prefix, &cursor)` -- the rocksdb `prefix_iterator_from` idiom: the lower bound is `prefix ++ cursor` (the `cursor` is the remainder of the key after the prefix, not a full key) and the upper bound is derived by incrementing the serialized prefix bytes, so these scans no longer need a synthetic maximum key. Cursor-paginated scans, where the cursor is the inclusive lower bound and the caller keeps its `.skip(cursor.is_some())` to make it exclusive: - IndexStore::get_dynamic_fields_iterator (jsonrpc) - GrpcIndexesStore dynamic_field_iter / package_versions_iter AuthorityPerpetualTables::get_newer_object_keys is migrated too, but it has no cursor: its lower bound is object.1.next() (version + 1), an exclusive-by-construction start, and it uses no `.skip`. The prefix scan covers the whole prefix, including an entry sitting exactly at the type's maximum tail that the old exclusive `(prefix, MAX)` upper bound dropped. This differs from the old behavior only for such a max-tail key, which is never a real object version or marker; for dynamic-field ids it is a corrected edge. Equivalent for all real data otherwise. Reversed pagination and the hash-bounded owner_iter keep explicit ranges -- their bounds are not pure prefixes. Part of #10929. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…of ending iteration `SafeIter::next` mapped a failed key/value deserialization to `None` via `.ok()`, which is indistinguishable from "end of data": a single undeserializable row silently truncated the rest of the scan, despite the iterator item being `Result<(K, V), TypedStoreError>` precisely so such errors can be reported. Return `Some(Err(TypedStoreError::Serialization(..)))` for a row that fails to deserialize, matching how the in-memory backend (`memstore`) and `DBMap::get`/`multi_get` already handle deserialization failures. Callers that `?` now propagate the error instead of seeing a short read; callers that `.unwrap()` fail loudly instead of losing data. Adds a regression test that reopens a column family with a wider value type and asserts iteration yields `Some(Err(..))` rather than silently ending. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

iota-ci added core-protocol node Issues related to the Core Node team labels Jun 11, 2026

chriscandless force-pushed the fix/typed-store-iterator-bounds branch 2 times, most recently from cec50cf to e4adf79 Compare June 12, 2026 17:40

chriscandless marked this pull request as ready for review June 12, 2026 18:05

chriscandless requested review from a team as code owners June 12, 2026 18:05

chriscandless requested review from muXxer and semenov-vladyslav June 12, 2026 18:13

bingyanglin reviewed Jun 15, 2026

View reviewed changes

Comment thread crates/typed-store/src/database.rs

Comment thread crates/typed-store/src/rocks/safe_iter.rs Outdated

Comment thread crates/typed-store/src/rocks/tests.rs

chriscandless force-pushed the fix/typed-store-iterator-bounds branch 2 times, most recently from 6f393f2 to ae44d0a Compare June 15, 2026 10:00

piotrm50 approved these changes Jun 15, 2026

View reviewed changes

chriscandless force-pushed the fix/typed-store-iterator-bounds branch 4 times, most recently from 28787d8 to 8df9d58 Compare June 16, 2026 14:38

semenov-vladyslav reviewed Jun 16, 2026

View reviewed changes

chriscandless force-pushed the fix/typed-store-iterator-bounds branch 3 times, most recently from 5fc489e to 4ac5720 Compare June 17, 2026 17:12

jkrvivian requested review from bingyanglin and semenov-vladyslav June 18, 2026 08:24

bingyanglin approved these changes Jun 22, 2026

View reviewed changes

semenov-vladyslav approved these changes Jun 22, 2026

View reviewed changes

chriscandless force-pushed the fix/typed-store-iterator-bounds branch from 4ac5720 to 30e5b39 Compare June 22, 2026 12:24

chriscandless and others added 5 commits June 22, 2026 14:53

chriscandless force-pushed the fix/typed-store-iterator-bounds branch from 30e5b39 to 62fb793 Compare June 22, 2026 12:53

muXxer approved these changes Jun 22, 2026

View reviewed changes

roman1e2f5p8s approved these changes Jun 22, 2026

View reviewed changes

old	new
`(None, None)`	`..`
`(None, Some(hi))`	`..=hi`
`(Some(lo), Some(hi))`	`lo..=hi`

Conversation

chriscandless commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of change

Remaining MAX usages (deliberately not removed)

Links to any relevant issues

How the change has been tested

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chriscandless commented Jun 15, 2026

Uh oh!

Uh oh!

semenov-vladyslav Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

chriscandless Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chriscandless Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

semenov-vladyslav Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

chriscandless Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

semenov-vladyslav Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

chriscandless Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

semenov-vladyslav Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

chriscandless Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

semenov-vladyslav Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

chriscandless Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

chriscandless commented Jun 11, 2026 •

edited

Loading

Remaining `MAX` usages (deliberately not removed)

chriscandless Jun 17, 2026 •

edited

Loading

chriscandless Jun 17, 2026 •

edited

Loading

chriscandless Jun 22, 2026 •

edited

Loading

chriscandless Jun 17, 2026 •

edited

Loading