feat: add `KatanaDbExtractor` for direct database sync by kariy · Pull Request #18 · dojoengine/torii-core

kariy · 2026-03-04T20:36:52Z

This adds a new KatanaDbExtractor that implements the Extractor trait by reading block data directly from Katana's MDBX database via DbProviderFactory, instead of going through JSON-RPC. When torii runs alongside Katana on the same machine, this eliminates all network overhead and enables significantly faster block syncing with zero latency.

The extractor opens the database in read-only mode (Db::open_ro) so it can safely run concurrently with a live Katana instance. Per extraction batch, it creates a fresh read-only provider and queries headers, transactions, receipts, declared classes, and deployed contracts for each block in the range. The cursor format (block:N) and persistence mechanism are consistent with the existing BlockRangeExtractor.

A max_events_per_batch config (default 100,000) caps the number of events per batch to bound memory usage. When the limit is reached mid-range, the batch is returned early with fewer blocks than batch_size, and the cursor points to the last fully processed block so the next call resumes correctly.

Benchmark

Benchmarked against a real Katana Sepolia database (492 GB MDBX, 884,993 blocks) using --release builds with a batch size of 1,000 blocks and a max_events_per_batch of 100,000. Rates are calculated as total_count / wall_time where wall time is measured from the first extract() call to the last using std::time::Instant, covering the full extraction loop including provider queries and batch construction — but excluding sink processing, since this benchmark only exercises the extractor.

Machine: AMD EPYC 9124 (16-core, 32 threads) · 124 GB RAM · 3.5 TB ext4 SSD

Metric	Value
Database	Katana Sepolia (492 GB MDBX)
Blocks processed	884,993
Total events	94,874,059
Total transactions	13,548,558
Total declared classes	45,704
Total deployed contracts	863,389
Batches	1,357
Wall time	885.5 s (~14.8 min)
Events/sec	107,144
Transactions/sec	15,301
Blocks/sec	999
Avg batch time	652.5 ms

Rust Version Compatibility

There is a Rust toolchain compatibility issue worth noting. Katana's transitive dependency on starknet-types-core v0.1.x pulls in the size-of v0.1.5 crate, which uses platform-specific ABIs (aapcs, sysv64, stdcall, fastcall) that became hard errors on aarch64 targets starting with Rust 1.85 (unsupported_calling_conventions lint). At the same time, katana's alloy dependencies (resolved to ^1.2) require Rust 1.88+. This creates a narrow compatibility window where only Rust 1.88.0 successfully compiles the combined dependency tree — older versions fail the alloy MSRV check, and newer versions (1.90+) reject the size-of ABIs. The lockfile has been pinned with compatible alloy versions (1.2.1) accordingly. This will resolve itself once katana's upstream dependencies move to starknet-types-core v0.2.x which drops the size-of dependency.

🤖 Generated with Claude Code

Add a new extractor that reads block data directly from Katana's MDBX database via DbProviderFactory, bypassing JSON-RPC entirely. This enables significantly faster block syncing with zero network latency when running alongside Katana. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch katana-db, katana-provider, and katana-primitives from local path dependencies to git dependencies pinned at rev 7e6fef8. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests use the spawn_and_move fixture database from katana to verify: - First batch extraction with correct block count and cursor - Full chain extraction covering all blocks - Event context integrity (block/tx references match) - Transaction field validity - Batch boundary behavior when beyond chain head - Full Extractor trait loop with cursor commit - Cursor resume correctly skips already-processed blocks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Runs the extractor through the entire database with timing metrics, reporting blocks/sec, transactions/sec, and events/sec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When blocks contain dense event data (e.g., 1M+ events per 1000 blocks), the extractor now yields a partial batch early once the event count threshold is reached (default: 100,000). The cursor points to the last fully processed block so the next call resumes correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The benchmark loop broke only on is_finished() which is never true in follow-chain-head mode (to_block=None). Now it breaks on any empty batch. Also adds probe examples for debugging and from_block CLI arg. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kariy and others added 4 commits March 4, 2026 14:36

chore: use git dependencies for katana crates

ebd17a5

Switch katana-db, katana-provider, and katana-primitives from local path dependencies to git dependencies pinned at rev 7e6fef8. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add benchmark example for KatanaDbExtractor throughput testing

971d462

Runs the extractor through the entire database with timing metrics, reporting blocks/sec, transactions/sec, and events/sec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kariy changed the title ~~feat: add KatanaDbExtractor for direct database sync~~ feat: add KatanaDbExtractor for direct database sync Mar 6, 2026

kariy and others added 2 commits March 6, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `KatanaDbExtractor` for direct database sync#18

feat: add `KatanaDbExtractor` for direct database sync#18
kariy wants to merge 6 commits intomainfrom
feat/katana-db-extractor

kariy commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kariy commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Rust Version Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kariy commented Mar 4, 2026 •

edited

Loading