research(nightly): semantic-drift-detector — in-database embedding drift detection for agent memory#469
Draft
ruvnet wants to merge 3 commits into
Draft
Conversation
Introduces `ruvector-drift`, a standalone Rust crate providing three complementary semantic drift detectors for agent memory and vector index health monitoring. Implements ADR-194. - CentroidDriftDetector: O(d) per obs, 3.6M obs/sec, detects mean shift - MmdDriftDetector: RFF-MMD approximation, 64K obs/sec, detects mean+variance drift - GraphDriftDetector: k-NN two-sample test, catches structural topology drift - Shared DriftDetector trait with promote_current / reset_current API - 9 unit tests all green; acceptance test PASS on benchmark binary - Workspace member added to Cargo.toml Benchmark (x86_64 Linux, Rust 1.94.1, release): centroid null: 197ns p50, 3.6M QPS, score=0.056 (no alert) centroid +2σ: 169ns p50, 4.9M QPS, score=2.000 (DRIFT) mmd-rff null: 19.6µs p50, 64K QPS, score=0.044 (no alert) mmd-rff +2σ: 19.5µs p50, 65K QPS, score=0.697 (DRIFT) mmd-rff GMM: 19.5µs p50, 65K QPS, score=0.658 (DRIFT) ← centroid misses this graph-knn +2σ: 1.77ms p50, 505 QPS, score=1.000 (DRIFT)
Architecture Decision Record for ruvector-drift crate. Documents the decision to provide three complementary drift detectors (centroid, MMD-RFF, graph-kNN), alternatives considered (Fréchet, domain classifier, HNSW- intrinsic), benchmark evidence, failure modes, and three-phase implementation plan.
Research document covers: - 2026 SOTA survey (DriftLens, SSGM, Drift-Adapter, memory surveys) - Three-pass research loop findings - Full benchmark results with real measured numbers - Memory and performance math - 8 practical applications (agent compaction, graph-RAG, MCP health endpoint, etc.) - 8 exotic applications (RVM coherence, proof-gated certification, swarm memory, etc.) - Production crate layout proposal - SEO-optimised gist for public distribution No vector database has native drift detection as of 2026-05-17. No Rust crate targets high-dimensional embedding-space drift.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
ruvector-drift, a standalone Rust crate providing three complementary semantic drift detectors for AI agent memory and vector index health. This is the first Rust crate targeting high-dimensional embedding-space drift detection. No existing vector database (Qdrant, Milvus, Weaviate, Pinecone, LanceDB, FAISS, pgvector, Chroma, Vespa) has a native drift detection capability as of 2026-05-17.What is semantic drift?
Long-running AI agents accumulate vector memories. As context changes, the statistical distribution of those memories shifts silently — agents retrieve stale neighbors, generate outdated context, and degrade without error signals. The SSGM framework (arXiv:2603.11768) proves this accumulates as O(T·ε) per iteration without governance.
What this adds
CentroidDriftDetector— O(d) per obs, 3.6M obs/sec at d=128. Detects mean shift.MmdDriftDetector— RFF-MMD approximation, 64K obs/sec. Detects mean AND variance/structural shifts. Recommended default.GraphDriftDetector— k-NN two-sample test, 507 reports/sec. Catches topology drift invisible to the other two (GMM structural drift score = 1.0).DriftDetectortrait withpromote_current/reset_currentwindow management.Key finding
Centroid drift scores 0.052 on GMM structural data (indistinguishable from null 0.056). MMD-RFF scores 0.658 (alert fires). This validates needing multiple complementary algorithms — structural drift is invisible to mean tracking alone.
Deliverables
crates/ruvector-drift/docs/adr/ADR-194-semantic-drift-detector.mddocs/research/nightly/2026-05-17-semantic-drift-detector/README.mddocs/research/nightly/2026-05-17-semantic-drift-detector/gist.mdBenchmark Results (x86_64 Linux, Rust 1.94.1, release)
Acceptance test: PASS — 6/6 checks passed.
Test Plan
cargo build --release -p ruvector-drift— clean build, no warningscargo test -p ruvector-drift— 9/9 tests passcargo run --release -p ruvector-drift --bin benchmark— acceptance PASS with real numberscargo fmt -p ruvector-drift— cleanEcosystem Integration Path
DriftScore.alert→ triggermemory-reindexworkflowvector_memory_healthtool backed byMmdDriftDetectordriftembedsCentroidDriftDetectorin HNSW write path (<300 ns overhead)drift_boundfield for edge cognitive packagesResearch Loop Summary
Top Alternatives Rejected
Research doc:
docs/research/nightly/2026-05-17-semantic-drift-detector/README.mdADR:
docs/adr/ADR-194-semantic-drift-detector.mdGist:
docs/research/nightly/2026-05-17-semantic-drift-detector/gist.mdThis branch should either become a production RuVector capability or a falsified research path with useful evidence.
Generated by Claude Code