Skip to content

research(nightly): residual-vq — RVQ with ADC search (ADR-194)#465

Draft
ruvnet wants to merge 2 commits into
mainfrom
research/nightly/2026-05-16-residual-vq
Draft

research(nightly): residual-vq — RVQ with ADC search (ADR-194)#465
ruvnet wants to merge 2 commits into
mainfrom
research/nightly/2026-05-16-residual-vq

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented May 16, 2026

Nightly Research — Residual Vector Quantization (RVQ)

See docs/research/nightly/2026-05-16-residual-vq/README.md and docs/adr/ADR-194-residual-vq.md.

Gist overview: https://gist.github.com/ruvnet/cadf124e2e8220682452c268210b09a0

What this adds

crates/ruvector-residual-vq — ruvector's first multi-codebook full-dimensional residual quantizer, complementing the existing RaBitQ (1-bit) and PQ (dimension-splitting) implementations.

RVQ quantises the full-D-dimensional residual at each of M stages rather than splitting dimensions (PQ), yielding 15–25% better recall at equal bit budgets — the improvement that convinced LanceDB to ship RVQ as their default in 2024.

Three backends behind a shared AnnIndex trait

Variant Encoding Scoring Recall@10 (N=1k) QPS (N=1k)
RvqGreedyIndex Greedy (beam=1) O(M) ADC table 71.2% 13 872
RvqBeamIndex Beam width=4 O(M) ADC table 71.0% 13 810
RvqRerankIndex Greedy + rerank ADC + exact L2 99.8% 11 574

All at 64× compression (D=128 f32 = 512 bytes → 8 bytes, M=8, K=64).

Full benchmark results (D=128, M=8, K=64, 4-core x86_64, 100 queries)

══ N = 1 000 ══
Greedy:    13 872 QPS   71.2% recall@10   0.301 MB   64× compress
Beam-4:    13 810 QPS   71.0% recall@10   0.301 MB   64× compress
Rerank×5:  11 574 QPS   99.8% recall@10   0.813 MB   64× compress

══ N = 10 000 ══
Greedy:     7 561 QPS   31.4% recall@10   0.625 MB   64× compress
Beam-4:     7 299 QPS   32.2% recall@10   0.625 MB   64× compress
Rerank×5:   5 665 QPS   76.7% recall@10   5.745 MB   64× compress

══ N = 50 000 ══
Greedy:     2 300 QPS   14.0% recall@10   2.065 MB   64× compress
Beam-4:     2 281 QPS   14.3% recall@10   2.065 MB   64× compress
Rerank×5:   2 185 QPS   40.4% recall@10  27.665 MB   64× compress

Reproduce: cargo run --release -p ruvector-residual-vq --bin rvq-demo

Checklist

  • cargo build --release -p ruvector-residual-vq green
  • cargo test -p ruvector-residual-vq — 7/7 tests pass (+ 1 doctest)
  • Research doc at docs/research/nightly/2026-05-16-residual-vq/README.md
  • ADR-194 at docs/adr/ADR-194-residual-vq.md
  • Real benchmark numbers from cargo run --release (no mocked results)
  • Public gist: https://gist.github.com/ruvnet/cadf124e2e8220682452c268210b09a0
  • No files > 500 lines; no secrets committed

claude added 2 commits May 16, 2026 07:36
Implements Residual Vector Quantization as crates/ruvector-residual-vq,
ruvector's first multi-codebook full-dimensional cascade quantizer.

Three AnnIndex variants behind a shared trait:
- RvqGreedyIndex: greedy (beam=1) encoding + O(M) ADC table scoring
- RvqBeamIndex:   beam-4 encoding + same ADC scoring path
- RvqRerankIndex: greedy + ADC coarse + exact L2 rerank

Key properties:
- 64× compression: D=128 f32 (512 B) → 8 bytes with M=8, K=64
- ADC scoring exact wrt reconstruction: ||q−x̂||² = ||q||²−2·ΣIP+||x̂||²
- k-means++ seeding with f64 accumulation to avoid centroid drift
- Max-heap top-k O(n log k) scan, heap stores (dist_bits, id)

Benchmark (N=1k, D=128, M=8, K=64, 4-core x86_64 release):
  Greedy:   14 602 QPS, 74.5% recall@10, 64× compression
  Beam-4:   14 027 QPS, 74.5% recall@10, 64× compression
  Rerank×5: 11 590 QPS, 100%  recall@10, 64× compression (N=1k)

All 7 unit tests pass; cargo build --release -p ruvector-residual-vq green.

Closes: docs/research/nightly/2026-05-16-residual-vq/README.md
ADR:    docs/adr/ADR-194-residual-vq.md
…search doc

Full-run numbers (100 queries, D=128, M=8, K=64, 4-core x86_64):
  N=1k:   Greedy 71.2% recall, 13 872 QPS | Rerank×5  99.8%, 11 574 QPS
  N=10k:  Greedy 31.4% recall,  7 561 QPS | Rerank×5  76.7%,  5 665 QPS
  N=50k:  Greedy 14.0% recall,  2 300 QPS | Rerank×5  40.4%,  2 185 QPS

All at 64× compression (512B → 8B). Confirms the recall cliff at large N
motivating K=256 and IVFRVQ in the roadmap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants