feat: add KatanaDbExtractor for direct database sync#18
Draft
Conversation
Add a new extractor that reads block data directly from Katana's MDBX database via DbProviderFactory, bypassing JSON-RPC entirely. This enables significantly faster block syncing with zero network latency when running alongside Katana. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch katana-db, katana-provider, and katana-primitives from local path dependencies to git dependencies pinned at rev 7e6fef8. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests use the spawn_and_move fixture database from katana to verify: - First batch extraction with correct block count and cursor - Full chain extraction covering all blocks - Event context integrity (block/tx references match) - Transaction field validity - Batch boundary behavior when beyond chain head - Full Extractor trait loop with cursor commit - Cursor resume correctly skips already-processed blocks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs the extractor through the entire database with timing metrics, reporting blocks/sec, transactions/sec, and events/sec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
KatanaDbExtractor for direct database sync
When blocks contain dense event data (e.g., 1M+ events per 1000 blocks), the extractor now yields a partial batch early once the event count threshold is reached (default: 100,000). The cursor points to the last fully processed block so the next call resumes correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The benchmark loop broke only on is_finished() which is never true in follow-chain-head mode (to_block=None). Now it breaks on any empty batch. Also adds probe examples for debugging and from_block CLI arg. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds a new
KatanaDbExtractorthat implements theExtractortrait by reading block data directly from Katana's MDBX database viaDbProviderFactory, instead of going through JSON-RPC. When torii runs alongside Katana on the same machine, this eliminates all network overhead and enables significantly faster block syncing with zero latency.The extractor opens the database in read-only mode (
Db::open_ro) so it can safely run concurrently with a live Katana instance. Per extraction batch, it creates a fresh read-only provider and queries headers, transactions, receipts, declared classes, and deployed contracts for each block in the range. The cursor format (block:N) and persistence mechanism are consistent with the existingBlockRangeExtractor.A
max_events_per_batchconfig (default 100,000) caps the number of events per batch to bound memory usage. When the limit is reached mid-range, the batch is returned early with fewer blocks thanbatch_size, and the cursor points to the last fully processed block so the next call resumes correctly.Benchmark
Benchmarked against a real Katana Sepolia database (492 GB MDBX, 884,993 blocks) using
--releasebuilds with a batch size of 1,000 blocks and amax_events_per_batchof 100,000. Rates are calculated astotal_count / wall_timewhere wall time is measured from the firstextract()call to the last usingstd::time::Instant, covering the full extraction loop including provider queries and batch construction — but excluding sink processing, since this benchmark only exercises the extractor.Machine: AMD EPYC 9124 (16-core, 32 threads) · 124 GB RAM · 3.5 TB ext4 SSD
Rust Version Compatibility
There is a Rust toolchain compatibility issue worth noting. Katana's transitive dependency on
starknet-types-core v0.1.xpulls in thesize-of v0.1.5crate, which uses platform-specific ABIs (aapcs,sysv64,stdcall,fastcall) that became hard errors on aarch64 targets starting with Rust 1.85 (unsupported_calling_conventionslint). At the same time, katana's alloy dependencies (resolved to^1.2) require Rust 1.88+. This creates a narrow compatibility window where only Rust 1.88.0 successfully compiles the combined dependency tree — older versions fail the alloy MSRV check, and newer versions (1.90+) reject thesize-ofABIs. The lockfile has been pinned with compatible alloy versions (1.2.1) accordingly. This will resolve itself once katana's upstream dependencies move tostarknet-types-core v0.2.xwhich drops thesize-ofdependency.🤖 Generated with Claude Code