Skip to content

fix: #430 HNSW insert beam + distance-based pruning + storage rebuild + CI guards#471

Open
ruvnet wants to merge 5 commits into
mainfrom
fix/issues-batch-470
Open

fix: #430 HNSW insert beam + distance-based pruning + storage rebuild + CI guards#471
ruvnet wants to merge 5 commits into
mainfrom
fix/issues-batch-470

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented May 18, 2026

Summary

Closes the three remaining root causes from #430 plus the storage-rebuild gap that PR #460 separately identified. All previously-undocumented HNSW correctness bugs at scale.

Tests

  • test_pruning_keeps_closest_not_newest — builds a hub with 20 close neighbours then 6 far neighbours, asserts no far_* id appears in top-10 around the hub. Fails on FIFO pruning.
  • test_index_rebuilt_from_storage_on_open — writes 5 vectors with one VectorDB, reopens against the same path, asserts search returns the persisted match. Fails on the empty-index regression.

All 21 router-core unit tests pass locally (release):
```
test result: ok. 21 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
```

Regression-guard CI

Three new structural guards complement the behavioural hnsw-recall-at-1 job:

Guard What it forbids
`hnsw-insert-beam-no-m2-clamp` textual `ef_construction.min(m * 2)` in `index.rs`
`hnsw-distance-based-neighbor-pruning` requires `calculate_distance` + the `> m * 2` overflow gate to coexist
`vector-db-rebuilds-index-on-open` requires `storage.get_all_ids()` in `vector_db.rs`

The existing `hnsw-recall-at-1` job now also runs `test_pruning_keeps_closest_not_newest` and `test_index_rebuilt_from_storage_on_open`.

Release plan

This PR does NOT bump `@ruvector/router` — the original 0.1.31 bump was reverted because the `optional-deps-resolvable-on-npm` regression guard fails when platform-specific subpackages don't yet exist on npm (CI publishes them only after a tag is cut).

After this PR merges, a separate release PR will:

  1. Re-bump @ruvector/router meta + 5 platform subpackages 0.1.30 → 0.1.31.
  2. Tag v0.1.31 (or workflow_dispatch on `publish-all.yml`) so CI builds + publishes all platform binaries together.

Test plan

  • CI green on `regression-guard.yml`
  • CI green on `ci.yml` (build + test)
  • After merge: release PR bumps router 0.1.30 → 0.1.31 → tag → CI publishes
  • Post-publish: `npx @ruvector/router@0.1.31` smoke test on the recall-at-1 repro

Supersedes

PR #460 (CoolDude1969) — overlapping work; this PR keeps the storage-rebuild idea and adds bugs B + C + CI guards. Credited in the closing comment.

🤖 Generated with claude-flow

ruvnet and others added 3 commits May 18, 2026 16:30
…ning + storage rebuild

Three remaining root causes from issue #430, plus the storage-rebuild gap from PR #460.

  Bug B — insert beam was clamped to ef_construction.min(m * 2). With defaults
          (m=16, ef_construction=200) the beam silently became 32. Late-
          inserted clusters got wired through whatever was near the entry
          point instead of through ef_construction-wide neighbour search.

  Bug C — adjacency-list pruning used `drain(0..drain_count)`, dropping the
          OLDEST edges regardless of distance. Proper HNSW pruning keeps the
          m CLOSEST edges. Now sort by `calculate_distance` to the anchor
          vector and truncate to m. Kept a fallback that preserves the
          newest-m behaviour when the anchor vector lookup fails so we
          never panic on a missing vector.

  Storage — VectorDB::new() always created a fresh empty HnswIndex, so
            previously persisted vectors were invisible to search after
            reopening the database. Now rebuild via storage.get_all_ids()
            + index.insert_batch() on open, and seed VectorDbStats.total_vectors
            with the recovered count.

Tests:
  - test_pruning_keeps_closest_not_newest: builds a hub with 20 close
    neighbours then 6 far neighbours, asserts no "far_*" id appears in
    top-10 around the hub. Fails on FIFO pruning.
  - test_index_rebuilt_from_storage_on_open: writes 5 vectors via one
    VectorDB instance, reopens against the same path, asserts search
    returns the persisted match. Fails on the historical empty-index bug.

Regression-guard CI additions:
  - hnsw-insert-beam-no-m2-clamp: textually forbids the ef_construction.min(m*2)
    pattern in index.rs.
  - hnsw-distance-based-neighbor-pruning: requires calculate_distance and the
    `> m * 2` overflow gate to both live in index.rs.
  - vector-db-rebuilds-index-on-open: requires storage.get_all_ids() in
    vector_db.rs.
  - hnsw-recall-at-1 job now also runs the two new tests.

Supersedes PR #460 (CoolDude1969) which covered storage rebuild + an
overlapping heap fix already in main from PR #466.

Closes #430.

Co-Authored-By: claude-flow <ruv@ruv.net>
Surface the #430 HNSW correctness fixes (insert beam, distance-based
pruning, storage rebuild) to npm consumers. Bump applies to the meta
package and all 5 platform-specific subpackages so optionalDependencies
resolve consistently after publish-all.yml runs.

Co-Authored-By: claude-flow <ruv@ruv.net>
The expanded README and 0.1.1 version were already published to npm by
an earlier release, but never committed back to git. Verified identical
to `npm pack @ruvector/diskann@0.1.1`. Bringing the working tree in sync
so future bumps start from a clean baseline.

Co-Authored-By: claude-flow <ruv@ruv.net>
ruvnet and others added 2 commits May 18, 2026 16:32
No behaviour change — collapses single-expression closure and assignment
onto one line per rustfmt defaults so the rustfmt CI job passes.

Co-Authored-By: claude-flow <ruv@ruv.net>
The `optional-deps-resolvable-on-npm` regression guard fails because
@ruvector/router-<platform>@0.1.31 doesn't exist on npm yet — those
platform binaries are only published by `publish-all.yml` after a tag is
cut, which happens AFTER this PR merges.

Splitting the work:
  - This PR: HNSW correctness fix + CI guards (keeps regression-guard
    green on every commit).
  - Follow-up release PR: bump @ruvector/router meta + 5 platform
    packages to 0.1.31, tag v0.1.31, publish-all.yml ships the fix.

This commit reverts c5c7e7f and is itself reverted in the release PR.

Co-Authored-By: claude-flow <ruv@ruv.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant