Skip to content

feat(bench): add attack_replay benchmark#299

Merged
RealiCZ merged 2 commits into
mainfrom
krabat/bench/attack-replay
May 27, 2026
Merged

feat(bench): add attack_replay benchmark#299
RealiCZ merged 2 commits into
mainfrom
krabat/bench/attack-replay

Conversation

@vincent-k2026
Copy link
Copy Markdown
Contributor

Summary

Adds attack_replay, a hermetic regression benchmark that replays a real MegaETH mainnet attack contract deployment through MegaEvm.

The fixture is a self-contained ~64 KB JSON snapshot captured via debug_traceCall + prestateTracer (diffMode=false) at a fixed block:

  • tx: caller / nonce / gas / value / 17 KB initcode / chain_id
  • prestate (11 accounts): 4 proxy contracts at 0x4200..., 1 ERC-20 with code + 10 storage slots, the caller, and 5 supporting storage contracts
  • block env: number / timestamp / basefee / gas_limit / beneficiary / mix_hash

Bench arms

Three arms on the same in-memory state:

arm typical wall time engine
attack_replay/equivalence ~1.15 ms MegaSpecId::EQUIVALENCE
attack_replay/mini_rex ~35.9 ms MegaSpecId::MINI_REX
attack_replay/pure_revm ~1.10 ms vanilla revm (Context::mainnet) baseline

Both mega-evm specs execute the exact same 205,951 opcodes and deploy the same 582-byte runtime. The ~30x gap between EQUIVALENCE and MINI_REX isolates the cost of the multi-dimensional AdditionalLimit accounting (quadratic LOG / compute / storage / data / KV buckets) that MINI_REX enables.

The pure_revm arm is a self-check: it should land near the EQUIVALENCE arm, confirming the bench is honest and not short-circuiting.

Why this bench

Numbers correlate directly with production: the mini_rex arm matches the sequencer-monitor's observed ~33 ms inside api.inspect(...) for this exact transaction, making the bench a stable, reproducible target for any limit-tracker / hot-path optimization (e.g. caching net_usage in FrameLimitTracker).

Sanity checks

Run before criterion warm-up:

  • Asserts ExecutionResult::Success variant + reports deployed code length + addr + accounts/slots touched.
  • Counts opcode steps via a minimal OpcodeCounter inspector and asserts steps >= MIN_EXPECTED_OPCODE_STEPS (100,000). Any future setup mistake that silently short-circuits tx validation will fail the bench loudly instead of producing artificially fast numbers.

Test plan

  • cargo bench --bench attack_replay -p mega-evm --no-run builds clean
  • cargo +nightly fmt --check -p mega-evm clean
  • cargo clippy --bench attack_replay -p mega-evm 0 warnings
  • cargo bench --bench attack_replay -p mega-evm -- --quick runs, all sanity assertions pass, numbers as expected

Adds `attack_replay`, a hermetic regression bench that replays a real
MegaETH mainnet attack contract deployment through `MegaEvm`.

The fixture is a self-contained ~64 KB JSON snapshot captured via
`debug_traceCall` + `prestateTracer` (diffMode=false) at a fixed block:
  - tx (caller / nonce / gas / value / 17 KB initcode / chain_id)
  - prestate (11 accounts: 4 proxy contracts at 0x4200..., 1 ERC-20 with
    code + 10 storage slots, caller, and 5 supporting storage contracts)
  - block env (number / timestamp / basefee / gas_limit / beneficiary /
    mix_hash)

The bench produces three arms on the same in-memory state:

  attack_replay/equivalence   ~1.15 ms   (MegaSpecId::EQUIVALENCE)
  attack_replay/mini_rex      ~35.9 ms   (MegaSpecId::MINI_REX)
  attack_replay/pure_revm     ~1.10 ms   (vanilla revm baseline)

Both mega-evm specs execute the exact same 205,951 opcodes and deploy
the same 582-byte runtime. The ~30x gap between EQUIVALENCE and
MINI_REX isolates the cost of the multi-dimensional AdditionalLimit
accounting (quadratic LOG / compute / storage / data / KV buckets).
The pure_revm arm is a self-check: it should land near EQUIVALENCE,
confirming the bench is honest and not short-circuiting.

Numbers correlate directly with production: the MINI_REX arm matches
the sequencer-monitor's observed ~33 ms inside `api.inspect(...)` for
this tx, making the bench a stable target for any limit-tracker /
hot-path optimization.

Sanity checks run before criterion warm-up:
  - ExecutionResult variant + deployed code + accounts/slots touched
  - opcode step count via a minimal OpcodeCounter inspector, with a
    MIN_EXPECTED_OPCODE_STEPS guard so any future setup mistake that
    silently short-circuits validation fails the bench loudly instead
    of producing artificially fast numbers.

Run:

    cargo bench --bench attack_replay
@Troublor
Copy link
Copy Markdown
Collaborator

@RealiCZ — non-blocking follow-up suggestion; happy to see this merge as-is.

The hand-rolled fixture parser is the right local call for one bench, but if more replay-style benches land we'll want a unified path. Sketching what the follow-up could look like:

  1. Adopt the EEST state test schema as the canonical fixture format. state-test already has it — TestUnit { env, pre, transaction, post, out }, with AccountInfo deriving both Serialize and Deserialize (camelCase + alloy_serde::quantity), so the format is already round-trippable.

  2. Extend mega-evme replay with --dump-fixture <FILE>. Record every Database::basic / storage / code_by_hash access during the replay to populate pre; copy block env into env, the on-chain tx into transaction, and the actual ResultAndState into post so the fixture self-checks on read. Workflow becomes:

    mega-evme replay <tx_hash> --rpc <url> --dump-fixture foo.json
    

    One command, no external cast rpc debug_traceCall + prestateTracer needed.

  3. Reuse state-test types from this bench. Add state-test as a dev-dependency on mega-evm — Cargo permits the dev-dep cycle (mega-evm --dev→ state-test --regular→ mega-evm). The bench drops parse_u256 / parse_bytes / parse_address + the bespoke TxFixture / AccountFixture / BlockFixture structs in favor of use state_test::types::TestUnit. mega-evme's private AccountState (in bin/mega-evme/src/common/state.rs) also folds into the same type.

Implementation notes for whoever picks this up:

  • state-test's lib target is implicit but real (src/lib.rs alongside [[bin]]) — no Cargo.toml change on the state-test side.
  • EEST TransactionParts is multi-variant (matrix of data × gas × value); for a single-shot bench fixture, use a single-element variant via TxPartIndices(0, 0, 0).
  • Worth carving default-features = false + a slim types feature on state-test so consumers don't drag in walkdir / indicatif / triehash / k256 / plain_hasher.
  • Keep this separate from the existing --rpc.capture-file / ReplayTransport machinery in bin/mega-evme/src/common/provider/ — that's a transport-level artifact feeding AlloyDB, solving a different problem.

…eplay

# Conflicts:
#	crates/mega-evm/Cargo.toml
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 27, 2026

LGTM. The benchmark is well-structured and the sanity-check approach (asserting Success + asserting ≥ 100,000 opcode steps) is exactly right for catching silent short-circuits before criterion warms up.

One observation (non-blocking, all sanity checks pass): the mega_evm arms zero out operator_fee_scalar/operator_fee_constant to make the OP-stack L1 fees disappear, while the pure_revm arm disables balance_check and base_fee entirely. Both approaches produce a successful execution, so the opcode-level work is the same across arms — but it's worth leaving a comment in build_pure_revm_context (or inline in the bench closure) explaining why disable_balance_check is needed for the revm arm but not for mega_evm, so the next reader doesn't have to reconstruct it.

Troublor's follow-up on adopting the EEST state test schema if more replay benches land is the right long-term call. Happy to see this merge as-is.

@RealiCZ RealiCZ added spec:stable Touches stable spec code — must not change behavior comp:misc Changes to the miscellaneous part of this repo api:unchanged No change to the public interface or API labels May 27, 2026
@RealiCZ RealiCZ merged commit e728098 into main May 27, 2026
35 of 36 checks passed
@RealiCZ RealiCZ deleted the krabat/bench/attack-replay branch May 27, 2026 06:11
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 27, 2026

Label check

Two labels look off:

  • comp:misc → should be comp:core: all changes are in crates/mega-evm/Cargo.toml and crates/mega-evm/benches/attack_replay.rs, which belong to the mega-evm core crate, not the miscellaneous part of the repo.

  • spec:stable → should be spec:unchanged: the benchmark uses the EQUIVALENCE and MINI_REX specs but does not touch or change any spec behavior. spec:unchanged ("No change to any mega-evm's behavior") is the correct descriptor here; spec:stable implies the diff modifies stable-spec code under a must-not-change-behavior constraint, which is not the case.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 27, 2026

Two label issues on this PR:

  1. comp:misccomp:core: All changes are in crates/mega-evm/ (Cargo.toml + benches/attack_replay.rs), which is the mega-evm core crate. comp:misc is for changes outside the main crates; comp:core is the correct fit here.

  2. spec:stablespec:unchanged: spec:stable is intended for PRs that touch stable spec code and must not change its behavior. This PR doesn't modify any spec implementation — it only exercises stable specs (EQUIVALENCE, MINI_REX) inside a benchmark. spec:unchanged ("No change to any mega-evm's behavior") is the accurate label for a benchmark-only addition.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 27, 2026

Label issues:

  • comp:misc → should be comp:core: all changed files (crates/mega-evm/Cargo.toml, crates/mega-evm/benches/attack_replay.rs, and the fixture under crates/mega-evm/benches/fixtures/) are in the mega-evm core crate.
  • spec:stable → should be spec:unchanged: no EVM behavior or spec code is modified — this PR only adds a benchmark. spec:stable indicates the PR "touches stable spec code," which isn't the case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api:unchanged No change to the public interface or API comp:misc Changes to the miscellaneous part of this repo spec:stable Touches stable spec code — must not change behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants