Feat: Single Sequencer#949
Conversation
📝 WalkthroughWalkthroughThis PR introduces Sequencer History V2 alongside a new Raft-based High-Availability (HA) system for the Morph sequencer. The L1Sequencer contract is refactored from a single address to an ordered history with L2-block activation scheduling. The verifier switches from TTL-based caching to history-backed lookups, and a new ChangesSequencer History V2 with High-Availability Raft Support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
- L1Sequencer.sol: sequencerHistory[], updateSequencer, getSequencerAt, initializeHistory - Bindings: updated ABI for new contract interface - SequencerVerifier: L1 history cache with interval cursor optimization - Signer: simplified interface (removed IsActiveSequencer) - 022-SequencerInit.ts: fixed initialize call (1 param instead of 2) - Docker: added L1_SEQUENCER_CONTRACT env for all nodes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run-test.sh: added build-malicious and p2p-test commands - docker-compose.override.yml: malicious-geth-0 and malicious-node-0 services - Tests: T-01~T-05 (active attacks) + T-06 (BlockSync pollution) + T-07 (resilience) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix grep -c multiline: use || true instead of || echo "0" - Fix env var loss: malicious override must include full env list - Swap approach: reuse synced sentry instead of fresh malicious container - Uncomment CONSENSUS_SWITCH_HEIGHT for V2 mode activation - Add SEQUENCER_PRIVATE_KEY to node-0 override Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use staking key (0xd998...) as SEQUENCER_PRIVATE_KEY for node-0 - Add initializeHistory() call in setup to register sequencer on L1 - Fixes "no sequencer record" error in V2 mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- T-06: use blocksync-forge (blocksync/reactor.go) instead of sync-forge (broadcast_reactor.go) - targets the actual V1 vulnerability path - T-06: stop node-3 to create gap, restart to trigger BlockSync - Phase 0: explicit checks for V2 mode, signer, and switch height - T-04: use futureHeight (currentHeight+10000) for deterministic unsolicited test - Separate log files per phase to prevent cross-contamination Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add L1Sequencer.t.sol: 27 Foundry tests covering initialize, initializeHistory, updateSequencer, getSequencerAt binary search edge cases, and access control - Regenerate l1sequencer.go with abigen (bytecode now matches current contract with sequencerHistory[], binary search, etc.) - Update verifier.go: L1SequencerHistoryRecord -> L1SequencerSequencerRecord - Add exponential backoff retry (10s -> 20s -> ... -> 5min) when initial history load fails, instead of waiting full 5 minutes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoids stuttering in abigen output (L1SequencerSequencerRecord -> L1SequencerHistoryRecord). No ABI/storage layout change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…om L1 contract Unify the upgrade height source: instead of a CLI flag / env var, the verifier now sets upgrade.UpgradeBlockHeight from the first history record fetched from the L1Sequencer contract. - node/l1sequencer/verifier.go: call SetUpgradeBlockHeight on first successful history load (prev==0) - node/cmd/node/main.go: remove ConsensusSwitchHeight flag read; require L1 Sequencer contract address - node/flags/flags.go: delete ConsensusSwitchHeight flag definition - docker-compose.override.yml: remove 5× MORPH_NODE_CONSENSUS_SWITCH_HEIGHT - run-test.sh: remove set_upgrade_height function, add wait_for_l1_finalized to ensure L1 contract data is finalized before L2 nodes start Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These env var overrides (DEPOSIT_CONTRACT_ADDRESS, SYNC_START_HEIGHT) and the malicious_geth_data volume should be managed via overlay/override files, not by modifying the base compose file directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ncer Add hakeeper module implementing a 3-node Raft cluster for sequencer HA. The HA cluster provides automatic leader election, block replication, and failover without changing the on-chain sequencer identity. node/hakeeper/: - HAService: wraps hashicorp/raft, implements SequencerHA interface - Config: layered loading (defaults -> TOML file -> CLI flags -> resolve -> validate) supports auto-detection of server_id (hostname) and advertised_addr (local IP) - BlockFSM: Raft FSM for block replication; onApplied callback drives geth apply - leaderMonitor: gates block production behind Barrier to ensure log catch-up - rpc/: JSON-RPC admin API (ha_leader, ha_clusterMembership, ha_addServerAsVoter, ha_removeServer, ha_transferLeader, ha_transferLeaderToServer) with HTTP middleware token auth on write operations node/flags/flags.go: - New flags: --ha.enabled, --ha.config, --ha.bootstrap, --ha.join, --ha.server-id, --ha.advertised-addr, --ha.rpc-token node/cmd/node/main.go: - initHAService(): init HA from flags/config when --ha.enabled is set - Fix typed-nil interface bug: pass untyped nil when HA is disabled node/sequencer/tm_node.go: - Pass HA service to tendermint node setup node/go.mod: - Add hashicorp/raft v1.7.1, raft-boltdb/v2 ops/docker-sequencer-test/: - docker-compose.ha-override.yml: 3-node Raft cluster config for devnet - run-ha-test.sh: 29-case integration test suite (config, cluster, block production, failover, admin API, lifecycle) - run-perf-test.sh: performance test harness Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire up the new engine_newL2BlockV2 API for reorg support: - Executor.ApplyBlockV2 now returns (applied bool, err error) matching the updated L2Node interface; detects idempotent skips and reorgs using BlockNumber + BlockByNumber checks before calling NewL2BlockV2 - RetryableClient.NewL2BlockV2 wraps the new authclient method with exponential backoff retry; excludes WrongBlockNumberError and ParentNotFoundError from retry (permanent errors) - RetryableClient.AssembleL2BlockV2 added for parent-hash-based block assembly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add BlockHashMismatchError and InvalidNextL1MsgIndexError to the retryableError() exclusion list so the executor stops re-sending invalid payloads back to geth. Made-with: Cursor
… not in PBFT validator set - Add Syncer()/SetSyncer() accessors to Executor for explicit syncer wiring - Start L1 syncer eagerly in main.go for separated-deployment / HA sequencers that are not PBFT validators (lazy-init path never fires for them) - Guard Syncer.Start() with atomic flag to prevent duplicate goroutines Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Increase blockCh buffer from 200 to 1000 to reduce drops under load. - Panic on nil onApplied callback in BlockFSM.Apply: this can only happen due to a programmer error (forgot to wire SetOnBlockApplied) and would otherwise silently succeed on the leader while followers diverge. - gofmt: realign one-line method bodies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- go-ethereum: v0.0.0-20260508105911-56deb7072ae4 - tendermint: v0.0.0-20260508065906-9e56b04da3c8 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add explicit replace directives in every go.mod to override MVS, because token-price-oracle indirectly required v1.10.14-..., which caused all workspace modules to resolve to the older version and miss new APIs (NewL2BlockV2, AssembleL2BlockV2, SetBlockTags, MorphTxType, updated AssembleL2Block/NewL2Block signatures). Drop the local-path replace block from go.work so the pinned pseudo-version is actually used. Follow-up: investigate the indirect dep that requires v1.10.14 and bump it so these per-module replaces can be removed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
650d038 to
8f4c5ee
Compare
Pull in the persistent-peer ban exemption + sigStore.Close fixes from morph-l2/tendermint feat/sequencer-optimize (commit c6f7e21e4). Updated via 'make update' after bumping TENDERMINT_TARGET_VERSION in the Makefile. All sub-modules tidied. morphnode + tendermint binaries build cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 15
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
contracts/deploy/022-SequencerInit.ts (1)
31-32:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReplace
toLocaleLowerCase()withtoLowerCase()for locale-insensitive address comparisons.Lines 31 and 48 use
toLocaleLowerCase(), which is locale-dependent. For hex address normalization, usetoLowerCase()to avoid edge-case mismatches.Suggested fix
- (await IL1SequencerProxy.implementation()).toLocaleLowerCase() !== L1SequencerImplAddress.toLocaleLowerCase() + (await IL1SequencerProxy.implementation()).toLowerCase() !== L1SequencerImplAddress.toLowerCase() ... - (await IL1SequencerProxy.implementation()).toLocaleLowerCase() === L1SequencerImplAddress.toLocaleLowerCase() + (await IL1SequencerProxy.implementation()).toLowerCase() === L1SequencerImplAddress.toLowerCase()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contracts/deploy/022-SequencerInit.ts` around lines 31 - 32, The code uses locale-dependent string normalization when comparing addresses: replace all calls to toLocaleLowerCase() with toLowerCase() (e.g., in the comparison involving IL1SequencerProxy.implementation() and L1SequencerImplAddress and any other occurrences in this file such as the second usage at the later comparison) so address comparisons are locale-insensitive; update those invocations to call toLowerCase() on the strings being compared.ops/docker-sequencer-test/run-test.sh (1)
53-60:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake parse failures explicit in
get_block_number.This can now return an empty string on transient RPC errors, but callers still do integer arithmetic/comparisons on the result. That turns a parse failure into
integer expression expectednoise or bogus test flow. Return non-zero here and let callers retry or fail fast.Suggested fix
get_block_number() { local rpc_url="${1:-$L2_RPC}" local result result=$(curl -s -X POST -H "Content-Type: application/json" \ --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \ "$rpc_url" 2>/dev/null) - echo "$result" | grep -o '"result":"[^"]*"' | cut -d'"' -f4 | xargs printf "%d" 2>/dev/null || true + local hex + hex=$(echo "$result" | grep -o '"result":"[^"]*"' | cut -d'"' -f4) + [ -n "$hex" ] || return 1 + printf "%d" "$hex" 2>/dev/null }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-test.sh` around lines 53 - 60, The get_block_number function currently prints an empty string on parse failure which causes callers to see "integer expression expected"; modify get_block_number to detect when the curl/grep/parse yields no numeric block number (empty or non-numeric), write a clear error to stderr including the rpc_url and raw response for debugging, and exit the function with a non-zero status (return 1) so callers can retry or fail fast; keep the existing signature and behavior on success (print numeric block and return 0).
🟡 Minor comments (6)
ops/docker-sequencer-test/run-perf-test.sh-27-27 (1)
27-27:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFragile docker build context:
BITGET_ROOTassumes a specific parent-directory layout.
BITGET_ROOTis computed as the parent of the morph repo and then used as the docker build context with Dockerfile paths likemorph/ops/docker-sequencer-test/Dockerfile.l2-geth-test. This implicitly requires (1) the morph repo to be checked out into a directory literally namedmorph, and (2) the parent directory to be a "safe" docker context — for many devs the parent is$HOMEor a multi-repo workspace, which means the build either fails (wrong dirname) or sends gigabytes of unrelated trees as build context. The variable name (BITGET_ROOT) also leaks an internal/organizational assumption into an OSS test script.If a sibling tree is genuinely required (e.g., a vendored
go-ethereumcheckout next tomorph/), make the requirement explicit and overridable, e.g.:-BITGET_ROOT="$(cd "$MORPH_ROOT/.." && pwd)" +# Parent dir is used as docker build context (must contain `morph/` and any +# sibling sources referenced by the Dockerfiles). Override via env if your +# layout differs. +BUILD_CONTEXT_ROOT="${BUILD_CONTEXT_ROOT:-$(cd "$MORPH_ROOT/.." && pwd)}"and update lines 117/119/123 to use
$BUILD_CONTEXT_ROOTplus the resolved repo dirname (basename "$MORPH_ROOT") instead of the hard-codedmorph/prefix. Otherwise, please document the required layout in the usage header.Also applies to: 114-124
ops/docker-sequencer-test/run-perf-test.sh-197-203 (1)
197-203:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
do_startreports success even when no HA leader is elected.
wait_for_ha_leader 60 || { log_warn "HA leader not found, checking logs..."; }swallows the failure, then line 202 unconditionally printslog_ok "HA cluster running"and returns 0.do_runwill then start txflood and gather[PERF]data against a cluster with no leader, producing misleading or empty summaries. For a perf harness this should fail fast.🛠️ Proposed fix
# Wait for HA leader log_info "Waiting for HA cluster formation..." sleep 10 - wait_for_ha_leader 60 || { log_warn "HA leader not found, checking logs..."; } + wait_for_ha_leader 60 || { log_err "HA leader not found; aborting"; return 1; } log_ok "HA cluster running"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ops/docker-sequencer-test/run-perf-test.sh` around lines 197 - 203, The do_start function currently ignores wait_for_ha_leader failures and always prints log_ok "HA cluster running", so change do_start to treat wait_for_ha_leader failure as fatal: call wait_for_ha_leader 60 and if it fails, emit a clear error (use log_error or log_warn) and return a non-zero status (or exit 1) so do_start does not print log_ok nor allow do_run to proceed; only print log_ok "HA cluster running" when wait_for_ha_leader succeeds. Ensure the change references wait_for_ha_leader, do_start, do_run, and log_ok so the failure propagates and the perf harness fails fast.node/hakeeper/block_fsm.go-48-59 (1)
48-59:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winBuffer size inconsistency between comment and code.
The comment on line 48 states "Buffer of 200" but the code on line 59 initializes the channel with a buffer of 1000. Update the comment to match the implementation.
📝 Proposed fix
- // Buffer of 200 gives ample room for transient subscriber slowness. + // Buffer of 1000 gives ample room for transient subscriber slowness. blockCh chan *types.BlockV2🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/block_fsm.go` around lines 48 - 59, The comment above the blockCh field in the BlockFSM struct is out of sync: update the comment that currently says "Buffer of 200 gives ample room for transient subscriber slowness." to reflect the actual buffer size used in NewBlockFSM (1000). Change the comment near the blockCh declaration in BlockFSM so it reads something like "Buffer of 1000 gives ample room for transient subscriber slowness." to match the initialization in NewBlockFSM.node/hakeeper/config.go-152-152 (1)
152-152:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winIPv6 wildcard address not handled.
The check
listenAddr != "0.0.0.0"only handles the IPv4 wildcard address. If operators configurelisten_addr = "::"(the IPv6 wildcard), the current logic will incorrectly treat it as a specific IP and use it directly, which will fail when constructing the advertised address.🔧 Proposed fix to handle IPv6 wildcard
- if listenAddr != "0.0.0.0" && listenAddr != "" { + if listenAddr != "0.0.0.0" && listenAddr != "::" && listenAddr != "" { return net.JoinHostPort(listenAddr, port), nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/config.go` at line 152, The current check uses listenAddr != "0.0.0.0" which misses the IPv6 wildcard "::"; update the logic that decides whether to use listenAddr for the advertised address to treat any unspecified IP as a wildcard. Parse listenAddr with net.ParseIP and use parsedIP.IsUnspecified() (and still treat empty string as wildcard) so both "0.0.0.0" and "::" are handled; change the conditional around listenAddr used when building the advertised address (the code referencing listenAddr and the advertised address construction) to only use listenAddr when parsedIP is non-nil and not unspecified.node/hakeeper/rpc/client.go-80-84 (1)
80-84:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReturn nil pointer on error.
When
CallContextreturns an error, the function still returns&membership(a pointer to a zero-valued struct). Returningnilon error is clearer and follows Go conventions.🛡️ Proposed fix
func (c *APIClient) ClusterMembership(ctx context.Context) (*ClusterMembership, error) { var membership ClusterMembership err := c.c.CallContext(ctx, &membership, prefixRPC("clusterMembership")) + if err != nil { + return nil, err + } return &membership, err }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/client.go` around lines 80 - 84, The ClusterMembership method currently always returns &membership even when c.c.CallContext returns an error; update APIClient.ClusterMembership so that after calling c.c.CallContext(ctx, &membership, prefixRPC("clusterMembership")) you check the returned err and if err != nil return nil, err, otherwise return &membership, nil; this uses the APIClient.ClusterMembership, CallContext and ClusterMembership identifiers to locate and implement the change.node/core/executor.go-382-387 (1)
382-387:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGuard
SetSyncer(nil).Assigning a
nil *sync.Syncertoe.l1MsgReaderproduces a non-nil interface value, so latere.l1MsgReader == nilchecks will pass and the first read call will dereference nil. Reject nil before wiring it in.Suggested fix
func (e *Executor) SetSyncer(s *sync.Syncer) { + if s == nil { + return + } if e.syncer != nil { return } e.syncer = s e.l1MsgReader = s }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/core/executor.go` around lines 382 - 387, The SetSyncer method currently assigns a nil *sync.Syncer into e.syncer and e.l1MsgReader which yields a non-nil interface and breaks later nil checks; update Executor.SetSyncer to reject nil by returning early (or logging/erroring) if the provided *sync.Syncer is nil, and only wire s to e.syncer and e.l1MsgReader when s != nil so that e.l1MsgReader comparisons remain valid (refer to method SetSyncer and fields e.syncer and e.l1MsgReader).
🧹 Nitpick comments (7)
node/hakeeper/ha_service.go (1)
181-185: ⚖️ Poor tradeoffPanic on FSM.Apply error crashes the leader node.
If the FSM's
onAppliedcallback fails (e.g., gethApplyBlockreturns an error), the leader panics and terminates. This is a hard failure that requires node restart and operator intervention.This design choice treats FSM apply failures as invariant violations: the block was committed to Raft (quorum achieved) but couldn't be applied to local state. Since followers also apply the same block, a failure here suggests either:
- A bug in the block application logic
- State corruption
- Transient resource exhaustion
Panicking forces immediate visibility and prevents the leader from continuing in an inconsistent state. However, it also means a single bad block can take down the leader, triggering failover.
Verify that this aligns with the intended failure mode. If transient errors (e.g., disk I/O spikes) are possible, consider retry logic before panicking.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/ha_service.go` around lines 181 - 185, The current leader code panics when the FSM Apply callback returns an error (the block handling path that checks f.Response() and calls panic with "hakeeper: leader FSM.Apply failed"), which kills the leader process; change this behavior in the leader FSM apply-handling (the block using f.Response(), resp.(error) check and panic) to instead surface the error and perform controlled handling: replace the panic with a bounded retry/backoff loop that re-invokes the local apply handler (or requeues the Raft log application) a configurable number of times, and if still failing, log the full error via the existing logger and trigger a safe leadership relinquish (step down) or mark the node unhealthy rather than terminating the process; ensure you preserve the original error in logs and metrics and reference the same response check (f.Response()) and error branch so callers can observe the failure mode.node/hakeeper/config.go (2)
203-208: 💤 Low valuePort validation allows port 0 for consensus and RPC.
Ports 0 (OS-assigned) are technically valid but likely not appropriate for Raft consensus and RPC servers, which need stable, well-known ports for cluster membership and client connections. While the current bounds check prevents negative values and values >65535, explicitly rejecting port 0 would prevent misconfigurations.
🔧 Proposed validation to reject port 0
- if c.Consensus.ListenPort < 0 || c.Consensus.ListenPort > math.MaxUint16 { + if c.Consensus.ListenPort <= 0 || c.Consensus.ListenPort > math.MaxUint16 { return fmt.Errorf("invalid consensus.listen_port: %d", c.Consensus.ListenPort) } - if c.RPC.ListenPort < 0 || c.RPC.ListenPort > math.MaxUint16 { + if c.RPC.ListenPort <= 0 || c.RPC.ListenPort > math.MaxUint16 { return fmt.Errorf("invalid rpc.listen_port: %d", c.RPC.ListenPort) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/config.go` around lines 203 - 208, The current port checks for c.Consensus.ListenPort and c.RPC.ListenPort allow 0; update the validation to reject zero by changing the conditions from "< 0 || > math.MaxUint16" to "<= 0 || > math.MaxUint16" (or explicitly add "|| c.Consensus.ListenPort == 0" and same for c.RPC.ListenPort) so that ports must be in 1..65535 and return the same formatted error messages if invalid.
164-191: ⚡ Quick winFunction name implies general IP detection but only returns IPv4.
The function
localNonLoopbackIPonly returns IPv4 addresses (line 185:ip.To4()), but the name suggests it would handle any IP version. The error message on line 190 correctly states "IPv4", but the function signature and name do not document this limitation.If IPv6 support is not required, consider renaming to
localNonLoopbackIPv4for clarity. If IPv6 should be supported, extend the logic to handle IPv6 addresses.📝 Option 1: Rename function to clarify IPv4-only behavior
-func localNonLoopbackIP() (string, error) { +func localNonLoopbackIPv4() (string, error) { ifaces, err := net.Interfaces()📝 Option 2: Add IPv6 support
if ip4 := ip.To4(); ip4 != nil && !ip4.IsLoopback() { return ip4.String(), nil } + // Fallback to IPv6 if no IPv4 available + if ip.To16() != nil && !ip.IsLoopback() { + return ip.String(), nil + } } } - return "", fmt.Errorf("no non-loopback IPv4 address found") + return "", fmt.Errorf("no non-loopback IP address found") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/config.go` around lines 164 - 191, The function localNonLoopbackIP currently only returns IPv4 (uses ip.To4()) but its name/signature implies any IP; either rename it to localNonLoopbackIPv4 and update its error text and all callers to reflect IPv4-only behavior (change the function name and the error string "no non-loopback IPv4 address found"), or if you need IPv6 support, modify localNonLoopbackIP to accept both families by removing the ip.To4() check and returning ip.String() for non-loopback IPv4 and IPv6 addresses (ensure callers handle possible IPv6 literals); update tests and any usages accordingly.node/hakeeper/rpc/server.go (2)
69-82: ⚡ Quick winConsider graceful shutdown with timeout.
Stop()useshttpServer.Close(), which immediately closes all active connections. UsinghttpServer.Shutdown(ctx)with a timeout context would allow in-flight requests to complete gracefully.♻️ Proposed graceful shutdown
// Stop gracefully shuts down the server. func (s *Server) Stop() { s.log.Info("Stopping hakeeper RPC server") if s.httpServer != nil { - if err := s.httpServer.Close(); err != nil { + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + if err := s.httpServer.Shutdown(ctx); err != nil { s.log.Error("hakeeper RPC server shutdown error", "err", err) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/server.go` around lines 69 - 82, Stop currently calls s.httpServer.Close() which force-closes active connections; replace that with a graceful shutdown using context.WithTimeout and s.httpServer.Shutdown(ctx) (e.g., 5s or configurable) to allow in-flight requests to finish, log any shutdown errors including context timeouts, and only call s.httpServer.Close() as a last-resort fallback if Shutdown returns an error; ensure this occurs inside Server.Stop before waiting on s.wg and before calling s.rpcServer.Stop().
56-67: ⚖️ Poor tradeoffStart returns before server is listening.
Start()spawns a goroutine and returns immediately, beforeListenAndServeconfirms the server is listening. If the server fails to start (e.g., port already in use), the error is only logged inside the goroutine and the caller receives no indication of failure.Consider using a channel to signal when the server is ready or if startup failed, so the caller can handle initialization errors appropriately.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/server.go` around lines 56 - 67, Start currently returns before the server has bound its port; fix Start in Server to synchronously bind the listening socket and only return nil after successful bind. Use net.Listen to create a listener for s.httpServer.Addr (capture error and return it), then spawn the goroutine that calls s.httpServer.Serve(listener) and keeps the existing waitgroup/error logging logic; ensure Serve is used instead of ListenAndServe so the goroutine uses the prebound listener and shutdown still works with that listener.node/hakeeper/rpc/backend.go (1)
10-11: ⚡ Quick winLogger field is stored but never used.
The
logfield is passed to the constructor and stored but is not used in any of the delegate methods. Consider either adding appropriate logging (e.g., for errors, debug info) or removing the field if not needed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/backend.go` around lines 10 - 11, The APIBackend struct stores a log.Logger in the field named "log" but never uses it; either remove the unused field from APIBackend (and its constructor NewAPIBackend) or wire it into the delegate methods on APIBackend by replacing silent error returns with process logging—use the "log" field to emit contextual messages (errors, warnings, or debug) inside methods that call the underlying delegate (so any method on APIBackend that currently forwards calls without logging should log failures or notable events using a descriptive message and the err value).node/hakeeper/rpc/client.go (1)
9-10: ⚡ Quick winConsider making
RPCNamespacea constant.Since
RPCNamespaceis never modified, declare it asconstinstead ofvarto prevent accidental reassignment and clarify intent.♻️ Proposed fix
-// RPCNamespace is the JSON-RPC namespace for the HA management API. -var RPCNamespace = "ha" +// RPCNamespace is the JSON-RPC namespace for the HA management API. +const RPCNamespace = "ha"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@node/hakeeper/rpc/client.go` around lines 9 - 10, RPCNamespace is declared as a package-level var but never modified; change its declaration to a const (replace var RPCNamespace = "ha" with const RPCNamespace = "ha") to prevent reassignment and clarify intent—update any references remain unchanged since const has the same visibility and name RPCNamespace.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@go-ethereum`:
- Line 1: The repository's go-ethereum submodule commit
(62952ec7d188953ee7b3224c3693baffa97c8084) does not match the commit referenced
in the Makefile and go.mod files (56deb7072ae467a12a850815c7a5c09b7c2782ba /
v0.0.0-20260508105911-56deb7072ae4); fix by making them consistent: either
update the submodule to the declared commit (56deb707...) or update the Makefile
and every go.mod to reference the actual submodule commit (62952ec7...).
Concretely, pick one source-of-truth, then update the Makefile variable(s) that
pin go-ethereum and all go.mod require/replace entries to the chosen
commit/version, and ensure the git submodule is checked out/synced to that
commit so builds are reproducible.
In `@node/cmd/node/main.go`:
- Around line 153-160: This branch starts l1Syncer via node.NewSyncer and
l1Syncer.Start but never assigns it to the outer syncer variable, so shutdown
only stops the original syncer and leaves l1Syncer running; fix by ensuring the
started syncer is the one cleaned up — either assign the outer syncer variable
to l1Syncer (e.g., syncer = l1Syncer) or update the cleanup path to fetch
executor.Syncer() and Stop() it; make the change in the branch where you call
node.NewSyncer, executor.SetSyncer, and l1Syncer.Start so the same syncer
instance is stopped on shutdown or error.
In `@node/hakeeper/ha_service.go`:
- Around line 165-198: Commit currently calls h.r.Apply with raftInfiniteTimeout
(0) which can block forever if quorum is lost; change Commit to use a finite
timeout instead of raftInfiniteTimeout (e.g., 30s–60s) when calling h.r.Apply,
add retry logic with exponential backoff and a maximum retry window around the
Apply call (use the returned Future f and its Response/Error handling to surface
failures), and integrate a circuit-breaker/health flag that trips after repeated
Apply timeouts to short-circuit further commits and emit an alert/metric; also
expose quorum/health state via a health-check endpoint so operators and
monitoring can detect and react to stalled Commit calls.
In `@node/hakeeper/rpc/client.go`:
- Around line 28-29: The DialAPIClient function currently prepends "http://" to
addr, forcing unencrypted connections; change it to use the provided addr as the
full URL (including scheme) when calling ethrpc.DialContext so callers can pass
"https://..." for TLS; ensure DialAPIClient (and any callers) expect a full URL
and optionally validate that addr has a valid scheme (http or https) before
calling ethrpc.DialContext to avoid silently malformed URLs.
In `@node/types/retryable_client.go`:
- Around line 120-135: The wrapper NewL2BlockV2 on RetryableClient is using the
shared rc.b backoff which is not thread-safe; create a fresh backoff per RPC by
calling newRetryBackoff() (or otherwise constructing/resetting a new backoff)
inside NewL2BlockV2 and pass that to backoff.Retry instead of rc.b; update the
call site in NewL2BlockV2 to use the fresh backoff and apply the same pattern to
the other RetryableClient wrapper methods so each RPC uses its own independent
backoff instance.
In `@ops/docker-sequencer-test/docker-compose.ha-override.yml`:
- Around line 127-128: Remove the hard-coded private key values for
MORPH_NODE_SEQUENCER_PRIVATE_KEY and instead reference an external secret or
environment variable; replace each literal assignment (the three occurrences of
MORPH_NODE_SEQUENCER_PRIVATE_KEY=0x...) with a variable reference (e.g. set
MORPH_NODE_SEQUENCER_PRIVATE_KEY=${MORPH_NODE_SEQUENCER_PRIVATE_KEY} in the
compose env block or switch the service to use a docker secret named
sequencer_private_key) and add a .env.example placeholder and/or instructions to
mount the real key via a secret or external .env (ensure the real .env is
gitignored). Ensure all occurrences (the three commits) are updated to use the
same external reference so the key is no longer stored in source control.
- Around line 120-125: The HA admin RPC port mappings currently publish the
admin endpoint on all interfaces (e.g., "- "9501:9401"" and the analogous
entries at the other occurrences), so change each host-side binding to bind to
localhost by prefixing the host port with 127.0.0.1 (e.g.,
"127.0.0.1:9501:9401") for the entries that expose port 9401; update the same
pattern at the other two occurrences referenced (the blocks around the other
port mappings) so the Raft admin RPC is only reachable from the host loopback.
In `@ops/docker-sequencer-test/run-ha-test.sh`:
- Around line 1166-1185: The test currently uses a future version
(stale_version_high = current_version + 100) which can be accepted and still
marked PASS; change the test to use a genuinely stale version (e.g.
stale_version_low = current_version - 1, guarding if current_version==0 use 0)
when calling ha_call with "ha_addServerAsVoter" so the request truly represents
an older optimistic-lock version, then assert failure by checking the ha_call
response contains an "error" or that get_membership_version(leader_rpc) remains
unchanged; update the record_test branches for "TC-API-08" to FAIL when the
stale request is accepted and the membership version changed or the member was
added, and PASS only when an error is returned or membership_version did not
change.
- Around line 667-680: The loop currently hardcodes "ha-node-1 ha-node-2" so it
skips any other followers (e.g., ha-node-0); update the loop to iterate over the
full set of HA nodes instead of only two (for example derive nodes from a
HA_NODES list or expand to "ha-node-0 ha-node-1 ha-node-2"), ensure you map each
node to its RPC variable (replace HA_RPC_NODE1/HA_RPC_NODE2 mapping with a
lookup that matches the node name to HA_RPC_NODE0/1/2 or to an associative
array), keep the existing is_ha_leader check and COMPOSE_HA logs/grep logic, and
append results to follower_produce_logs as before so every non-leader node is
inspected.
- Around line 1073-1077: The cleanup path deletes the wrong raft directory
because the constructed path for rm -rf doesn't match the directory naming used
in setup_ha_nodes_config (it produces names like .devnet/ha-node0 but the script
currently resolves to .devnet/ha-node-0); update the removal to compute the
exact node directory name the same way setup_ha_nodes_config does (derive the
target directory from target_follower_id using the same transformation or reuse
the variable/logic from setup_ha_nodes_config) so rm -rf
"$DOCKER_DIR/.devnet/<correct-node-dir>/raft" removes the stale raft state for
the intended node; ensure this change is applied where post_add_count check does
the force-recover (references: post_add_count, target_follower_id, DOCKER_DIR,
setup_ha_nodes_config).
- Around line 1412-1420: leader_height_before/after are being fetched from
HA_RPC_NODE0 (the admin RPC) which doesn't return L2 blockNumber; replace the HA
admin RPC with the current leader's geth RPC when calling get_block_number so
leader_height_* reflects the leader fullnode L2 height, and stop hard-coding
node 0 — use the variable that holds the current leader RPC endpoint (the same
leader identifier used elsewhere in the script) so gap = leader_height_after -
follower_height_after is valid; update the two calls to get_block_number that
set leader_height_before and leader_height_after accordingly.
In `@ops/docker-sequencer-test/run-perf-test.sh`:
- Around line 277-307: The produceBlock awk parser currently writes apply_ms
into the same variable as raft_commit_ms (both assign to commit), corrupting
raft_commit_ms; update the parser to parse apply_ms into its own variable (e.g.,
apply), add accumulators s_apply and trackers max_apply/min_apply alongside the
existing s_commit/max_commit/min_commit, increment s_apply and update
max_apply/min_apply in the per-line block, and add a printf row for "apply_ms:"
in the END block (similar to raft_commit_ms:) so apply_ms is reported separately
while raft_commit_ms remains correct.
In `@ops/docker-sequencer-test/run-test.sh`:
- Around line 530-533: The build_malicious_image function uses an undefined
BITGET_ROOT which causes cd to fail; fix by either initializing BITGET_ROOT from
an existing repo variable or using that existing variable directly. For example,
set BITGET_ROOT="${BITGET_ROOT:-$POLYREPO_ROOT}" near script initialization or
replace cd "$BITGET_ROOT" inside build_malicious_image with cd "$POLYREPO_ROOT"
(or cd "$MORPH_ROOT" if that is the intended repo), referencing
build_malicious_image, BITGET_ROOT, POLYREPO_ROOT and MORPH_ROOT to locate the
change.
- Around line 215-225: The try/except around the initializeHistory call is
swallowing all errors and masking real failures; change it so initializeHistory
failures on a fresh deploy surface: either remove the broad except and let
exceptions propagate from run_command('initializeHistory(address,uint64)', ...)
so the script fails on real errors, or catch the exception, inspect the error
text returned by run_command and only suppress it if it unmistakably indicates
"already initialized" (otherwise re-raise); update the block around run_command
/ initializeHistory / L1Sequencer to log and then raise for unexpected errors so
the test cannot continue without a valid sequencer history.
- Around line 534-552: The script currently stashes local changes, checks out
test/p2p-security, and runs docker build but does not guarantee cleanup on
failure; update the block that sets original_branch, runs git stash, git
checkout test/p2p-security and docker build to always restore the original
branch and stash state by installing a trap-based cleanup (or use a separate git
worktree) that on EXIT checks if original_branch is set and then runs git
checkout "$original_branch" and git stash pop (if stash was created), ensuring
the trap is registered immediately after computing original_branch and before
any potentially failing command (refer to the original_branch variable, git
stash, git checkout test/p2p-security and docker build -t
morph-node-malicious:latest).
---
Outside diff comments:
In `@contracts/deploy/022-SequencerInit.ts`:
- Around line 31-32: The code uses locale-dependent string normalization when
comparing addresses: replace all calls to toLocaleLowerCase() with toLowerCase()
(e.g., in the comparison involving IL1SequencerProxy.implementation() and
L1SequencerImplAddress and any other occurrences in this file such as the second
usage at the later comparison) so address comparisons are locale-insensitive;
update those invocations to call toLowerCase() on the strings being compared.
In `@ops/docker-sequencer-test/run-test.sh`:
- Around line 53-60: The get_block_number function currently prints an empty
string on parse failure which causes callers to see "integer expression
expected"; modify get_block_number to detect when the curl/grep/parse yields no
numeric block number (empty or non-numeric), write a clear error to stderr
including the rpc_url and raw response for debugging, and exit the function with
a non-zero status (return 1) so callers can retry or fail fast; keep the
existing signature and behavior on success (print numeric block and return 0).
---
Minor comments:
In `@node/core/executor.go`:
- Around line 382-387: The SetSyncer method currently assigns a nil *sync.Syncer
into e.syncer and e.l1MsgReader which yields a non-nil interface and breaks
later nil checks; update Executor.SetSyncer to reject nil by returning early (or
logging/erroring) if the provided *sync.Syncer is nil, and only wire s to
e.syncer and e.l1MsgReader when s != nil so that e.l1MsgReader comparisons
remain valid (refer to method SetSyncer and fields e.syncer and e.l1MsgReader).
In `@node/hakeeper/block_fsm.go`:
- Around line 48-59: The comment above the blockCh field in the BlockFSM struct
is out of sync: update the comment that currently says "Buffer of 200 gives
ample room for transient subscriber slowness." to reflect the actual buffer size
used in NewBlockFSM (1000). Change the comment near the blockCh declaration in
BlockFSM so it reads something like "Buffer of 1000 gives ample room for
transient subscriber slowness." to match the initialization in NewBlockFSM.
In `@node/hakeeper/config.go`:
- Line 152: The current check uses listenAddr != "0.0.0.0" which misses the IPv6
wildcard "::"; update the logic that decides whether to use listenAddr for the
advertised address to treat any unspecified IP as a wildcard. Parse listenAddr
with net.ParseIP and use parsedIP.IsUnspecified() (and still treat empty string
as wildcard) so both "0.0.0.0" and "::" are handled; change the conditional
around listenAddr used when building the advertised address (the code
referencing listenAddr and the advertised address construction) to only use
listenAddr when parsedIP is non-nil and not unspecified.
In `@node/hakeeper/rpc/client.go`:
- Around line 80-84: The ClusterMembership method currently always returns
&membership even when c.c.CallContext returns an error; update
APIClient.ClusterMembership so that after calling c.c.CallContext(ctx,
&membership, prefixRPC("clusterMembership")) you check the returned err and if
err != nil return nil, err, otherwise return &membership, nil; this uses the
APIClient.ClusterMembership, CallContext and ClusterMembership identifiers to
locate and implement the change.
In `@ops/docker-sequencer-test/run-perf-test.sh`:
- Around line 197-203: The do_start function currently ignores
wait_for_ha_leader failures and always prints log_ok "HA cluster running", so
change do_start to treat wait_for_ha_leader failure as fatal: call
wait_for_ha_leader 60 and if it fails, emit a clear error (use log_error or
log_warn) and return a non-zero status (or exit 1) so do_start does not print
log_ok nor allow do_run to proceed; only print log_ok "HA cluster running" when
wait_for_ha_leader succeeds. Ensure the change references wait_for_ha_leader,
do_start, do_run, and log_ok so the failure propagates and the perf harness
fails fast.
---
Nitpick comments:
In `@node/hakeeper/config.go`:
- Around line 203-208: The current port checks for c.Consensus.ListenPort and
c.RPC.ListenPort allow 0; update the validation to reject zero by changing the
conditions from "< 0 || > math.MaxUint16" to "<= 0 || > math.MaxUint16" (or
explicitly add "|| c.Consensus.ListenPort == 0" and same for c.RPC.ListenPort)
so that ports must be in 1..65535 and return the same formatted error messages
if invalid.
- Around line 164-191: The function localNonLoopbackIP currently only returns
IPv4 (uses ip.To4()) but its name/signature implies any IP; either rename it to
localNonLoopbackIPv4 and update its error text and all callers to reflect
IPv4-only behavior (change the function name and the error string "no
non-loopback IPv4 address found"), or if you need IPv6 support, modify
localNonLoopbackIP to accept both families by removing the ip.To4() check and
returning ip.String() for non-loopback IPv4 and IPv6 addresses (ensure callers
handle possible IPv6 literals); update tests and any usages accordingly.
In `@node/hakeeper/ha_service.go`:
- Around line 181-185: The current leader code panics when the FSM Apply
callback returns an error (the block handling path that checks f.Response() and
calls panic with "hakeeper: leader FSM.Apply failed"), which kills the leader
process; change this behavior in the leader FSM apply-handling (the block using
f.Response(), resp.(error) check and panic) to instead surface the error and
perform controlled handling: replace the panic with a bounded retry/backoff loop
that re-invokes the local apply handler (or requeues the Raft log application) a
configurable number of times, and if still failing, log the full error via the
existing logger and trigger a safe leadership relinquish (step down) or mark the
node unhealthy rather than terminating the process; ensure you preserve the
original error in logs and metrics and reference the same response check
(f.Response()) and error branch so callers can observe the failure mode.
In `@node/hakeeper/rpc/backend.go`:
- Around line 10-11: The APIBackend struct stores a log.Logger in the field
named "log" but never uses it; either remove the unused field from APIBackend
(and its constructor NewAPIBackend) or wire it into the delegate methods on
APIBackend by replacing silent error returns with process logging—use the "log"
field to emit contextual messages (errors, warnings, or debug) inside methods
that call the underlying delegate (so any method on APIBackend that currently
forwards calls without logging should log failures or notable events using a
descriptive message and the err value).
In `@node/hakeeper/rpc/client.go`:
- Around line 9-10: RPCNamespace is declared as a package-level var but never
modified; change its declaration to a const (replace var RPCNamespace = "ha"
with const RPCNamespace = "ha") to prevent reassignment and clarify
intent—update any references remain unchanged since const has the same
visibility and name RPCNamespace.
In `@node/hakeeper/rpc/server.go`:
- Around line 69-82: Stop currently calls s.httpServer.Close() which
force-closes active connections; replace that with a graceful shutdown using
context.WithTimeout and s.httpServer.Shutdown(ctx) (e.g., 5s or configurable) to
allow in-flight requests to finish, log any shutdown errors including context
timeouts, and only call s.httpServer.Close() as a last-resort fallback if
Shutdown returns an error; ensure this occurs inside Server.Stop before waiting
on s.wg and before calling s.rpcServer.Stop().
- Around line 56-67: Start currently returns before the server has bound its
port; fix Start in Server to synchronously bind the listening socket and only
return nil after successful bind. Use net.Listen to create a listener for
s.httpServer.Addr (capture error and return it), then spawn the goroutine that
calls s.httpServer.Serve(listener) and keeps the existing waitgroup/error
logging logic; ensure Serve is used instead of ListenAndServe so the goroutine
uses the prebound listener and shutdown still works with that listener.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 35229859-a993-4347-996e-549d05c91f82
⛔ Files ignored due to path filters (9)
bindings/go.sumis excluded by!**/*.sumcontracts/go.sumis excluded by!**/*.sumgo.work.sumis excluded by!**/*.sumnode/go.sumis excluded by!**/*.sumops/l2-genesis/go.sumis excluded by!**/*.sumops/tools/go.sumis excluded by!**/*.sumoracle/go.sumis excluded by!**/*.sumtoken-price-oracle/go.sumis excluded by!**/*.sumtx-submitter/go.sumis excluded by!**/*.sum
📒 Files selected for processing (43)
Makefilebindings/bindings/l1sequencer.gobindings/go.modcontracts/contracts/l1/L1Sequencer.solcontracts/contracts/test/L1Sequencer.t.solcontracts/contracts/test/base/L1SequencerBase.t.solcontracts/deploy/022-SequencerInit.tscontracts/go.modgo-ethereumnode/cmd/node/main.gonode/core/executor.gonode/flags/flags.gonode/go.modnode/hakeeper/block_fsm.gonode/hakeeper/block_payload.gonode/hakeeper/config.gonode/hakeeper/ha.toml.examplenode/hakeeper/ha_service.gonode/hakeeper/leader_monitor.gonode/hakeeper/rpc/api.gonode/hakeeper/rpc/auth.gonode/hakeeper/rpc/auth_test.gonode/hakeeper/rpc/backend.gonode/hakeeper/rpc/client.gonode/hakeeper/rpc/server.gonode/hakeeper/rpc/types.gonode/l1sequencer/signer.gonode/l1sequencer/verifier.gonode/sequencer/tm_node.gonode/sync/syncer.gonode/types/retryable_client.gonode/types/retryable_client_test.goops/docker-sequencer-test/docker-compose.ha-override.ymlops/docker-sequencer-test/docker-compose.override.ymlops/docker-sequencer-test/run-ha-test.shops/docker-sequencer-test/run-perf-test.shops/docker-sequencer-test/run-test.shops/docker/.envops/l2-genesis/go.modops/tools/go.modoracle/go.modtoken-price-oracle/go.modtx-submitter/go.mod
| @@ -1 +1 @@ | |||
| Subproject commit 045be0fdc7ca6f80e18eb4e26f7452500292ccec | |||
| Subproject commit 62952ec7d188953ee7b3224c3693baffa97c8084 | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check go-ethereum version consistency across submodule, go.mod files, and Makefile
echo "=== Submodule commit ==="
cat go-ethereum
echo -e "\n=== Makefile go-ethereum pins ==="
rg -n "ETHEREUM_SUBMODULE_COMMIT_OR_TAG|ETHEREUM_TARGET_VERSION" Makefile
echo -e "\n=== go.mod replace directives for go-ethereum ==="
fd -e mod -x rg -n "github.com/morph-l2/go-ethereum" {}
echo -e "\n=== Extracting commit from pseudo-version (if present) ==="
# Pseudo-version format: v0.0.0-yyyymmddhhmmss-abcdefabcdef
fd -e mod -x rg -oP "github.com/morph-l2/go-ethereum.*v\d+\.\d+\.\d+-\d{14}-\K[a-f0-9]{12}" {}Repository: morph-l2/morph
Length of output: 2224
🏁 Script executed:
git config -f .gitmodules --get-regexp 'go-ethereum' || echo "No .gitmodules found"
git submodule status go-ethereum 2>/dev/null || echo "Submodule status failed, checking .git/modules"
cat .git/modules/go-ethereum/HEAD 2>/dev/null || echo "No HEAD file"
cd go-ethereum && git rev-parse HEAD 2>/dev/null || echo "Cannot get HEAD from submodule"Repository: morph-l2/morph
Length of output: 275
Fix submodule commit mismatch with version declarations.
The go-ethereum submodule is currently at commit 62952ec7d188953ee7b3224c3693baffa97c8084, but the Makefile and all go.mod files declare a different commit: 56deb7072ae467a12a850815c7a5c09b7c2782ba (referenced as version v0.0.0-20260508105911-56deb7072ae4 in go.mod). This inconsistency breaks reproducible builds and will cause version mismatches between the actual submodule and the declared dependency version. Either update the submodule to the Makefile's intended commit, or update the Makefile and all go.mod files to point to the current submodule commit.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@go-ethereum` at line 1, The repository's go-ethereum submodule commit
(62952ec7d188953ee7b3224c3693baffa97c8084) does not match the commit referenced
in the Makefile and go.mod files (56deb7072ae467a12a850815c7a5c09b7c2782ba /
v0.0.0-20260508105911-56deb7072ae4); fix by making them consistent: either
update the submodule to the declared commit (56deb707...) or update the Makefile
and every go.mod to reference the actual submodule commit (62952ec7...).
Concretely, pick one source-of-truth, then update the Makefile variable(s) that
pin go-ethereum and all go.mod require/replace entries to the chosen
commit/version, and ensure the git submodule is checked out/synced to that
commit so builds are reproducible.
| if signer != nil && executor.Syncer() == nil { | ||
| l1Syncer, err := node.NewSyncer(ctx, home, nodeConfig) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to init L1 syncer for post-upgrade sequencer: %w", err) | ||
| } | ||
| executor.SetSyncer(l1Syncer) | ||
| l1Syncer.Start() | ||
| nodeConfig.Logger.Info("L1 syncer start", "reason", "post-upgrade sequencer not in PBFT validator set") |
There was a problem hiding this comment.
Track and stop the eager syncer.
This branch starts l1Syncer, but the shutdown path only stops the outer syncer variable. A later startup error or a clean shutdown leaves this polling loop and its RPC client running until process exit. Reuse the outer variable here, or stop executor.Syncer() during cleanup.
Suggested fix
- l1Syncer, err := node.NewSyncer(ctx, home, nodeConfig)
+ syncer, err = node.NewSyncer(ctx, home, nodeConfig)
if err != nil {
return fmt.Errorf("failed to init L1 syncer for post-upgrade sequencer: %w", err)
}
- executor.SetSyncer(l1Syncer)
- l1Syncer.Start()
+ executor.SetSyncer(syncer)
+ syncer.Start()
nodeConfig.Logger.Info("L1 syncer start", "reason", "post-upgrade sequencer not in PBFT validator set")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if signer != nil && executor.Syncer() == nil { | |
| l1Syncer, err := node.NewSyncer(ctx, home, nodeConfig) | |
| if err != nil { | |
| return fmt.Errorf("failed to init L1 syncer for post-upgrade sequencer: %w", err) | |
| } | |
| executor.SetSyncer(l1Syncer) | |
| l1Syncer.Start() | |
| nodeConfig.Logger.Info("L1 syncer start", "reason", "post-upgrade sequencer not in PBFT validator set") | |
| if signer != nil && executor.Syncer() == nil { | |
| syncer, err = node.NewSyncer(ctx, home, nodeConfig) | |
| if err != nil { | |
| return fmt.Errorf("failed to init L1 syncer for post-upgrade sequencer: %w", err) | |
| } | |
| executor.SetSyncer(syncer) | |
| syncer.Start() | |
| nodeConfig.Logger.Info("L1 syncer start", "reason", "post-upgrade sequencer not in PBFT validator set") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/cmd/node/main.go` around lines 153 - 160, This branch starts l1Syncer
via node.NewSyncer and l1Syncer.Start but never assigns it to the outer syncer
variable, so shutdown only stops the original syncer and leaves l1Syncer
running; fix by ensuring the started syncer is the one cleaned up — either
assign the outer syncer variable to l1Syncer (e.g., syncer = l1Syncer) or update
the cleanup path to fetch executor.Syncer() and Stop() it; make the change in
the branch where you call node.NewSyncer, executor.SetSyncer, and l1Syncer.Start
so the same syncer instance is stopped on shutdown or error.
| func (h *HAService) Commit(block *types.BlockV2) error { | ||
| t0 := time.Now() | ||
|
|
||
| data, err := encodeBlock(block) | ||
| if err != nil { | ||
| return fmt.Errorf("Commit: encode: %w", err) | ||
| } | ||
| encodeDur := time.Since(t0) | ||
|
|
||
| t1 := time.Now() | ||
| f := h.r.Apply(data, raftInfiniteTimeout) | ||
| if err := f.Error(); err != nil { | ||
| return err | ||
| } | ||
| raftDur := time.Since(t1) | ||
|
|
||
| if resp := f.Response(); resp != nil { | ||
| if err, ok := resp.(error); ok { | ||
| panic(fmt.Sprintf("hakeeper: leader FSM.Apply failed: %v", err)) | ||
| } | ||
| } | ||
|
|
||
| totalDur := time.Since(t0) | ||
| h.logger.Debug("[PERF] HAService.Commit", | ||
| "height", block.Number, | ||
| "encode_ms", float64(encodeDur.Microseconds())/1000.0, | ||
| "raft_ms", float64(raftDur.Microseconds())/1000.0, | ||
| "total_ms", float64(totalDur.Microseconds())/1000.0, | ||
| "dataBytes", len(data), | ||
| "txCount", len(block.Transactions), | ||
| ) | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Infinite timeout on Raft.Apply can hang the sequencer indefinitely.
Line 175 uses raftInfiniteTimeout (0), which makes the leader block forever until quorum is achieved. If the cluster loses quorum (e.g., network partition, follower crashes), Commit() will hang indefinitely, freezing block production.
While this ensures correctness (no blocks committed without quorum), it creates an operational hazard: the sequencer appears unresponsive with no automatic recovery. Operators must restore quorum manually.
Consider adding:
- A finite timeout (e.g., 30s-60s) with retry logic and alerting
- Health check endpoints that report quorum loss
- Circuit-breaker pattern to detect prolonged failures
Alternatively, document this behavior explicitly in operator runbooks so that monitoring can detect hung Commit() calls via metrics (e.g., time since last block).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/hakeeper/ha_service.go` around lines 165 - 198, Commit currently calls
h.r.Apply with raftInfiniteTimeout (0) which can block forever if quorum is
lost; change Commit to use a finite timeout instead of raftInfiniteTimeout
(e.g., 30s–60s) when calling h.r.Apply, add retry logic with exponential backoff
and a maximum retry window around the Apply call (use the returned Future f and
its Response/Error handling to surface failures), and integrate a
circuit-breaker/health flag that trips after repeated Apply timeouts to
short-circuit further commits and emit an alert/metric; also expose
quorum/health state via a health-check endpoint so operators and monitoring can
detect and react to stalled Commit calls.
| func DialAPIClient(ctx context.Context, addr string, token string) (*APIClient, error) { | ||
| c, err := ethrpc.DialContext(ctx, "http://"+addr) |
There was a problem hiding this comment.
Hardcoded HTTP scheme prevents TLS encryption.
The function hardcodes "http://", forcing all RPC communication to be unencrypted. The management API handles sensitive cluster operations (leader transfer, membership changes) that should be protected in production.
🔒 Proposed fix to support both HTTP and HTTPS
-func DialAPIClient(ctx context.Context, addr string, token string) (*APIClient, error) {
- c, err := ethrpc.DialContext(ctx, "http://"+addr)
+// DialAPIClient dials a hakeeper RPC server at the given address and returns
+// an APIClient. addr should include the scheme (http:// or https://).
+// token is sent as the Authorization header on every request;
+// pass empty string if the server has no auth configured.
+// The caller is responsible for calling Close() when done.
+func DialAPIClient(ctx context.Context, addr string, token string) (*APIClient, error) {
+ c, err := ethrpc.DialContext(ctx, addr)Callers should then pass the full URL with scheme, e.g., "https://127.0.0.1:8545".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/hakeeper/rpc/client.go` around lines 28 - 29, The DialAPIClient function
currently prepends "http://" to addr, forcing unencrypted connections; change it
to use the provided addr as the full URL (including scheme) when calling
ethrpc.DialContext so callers can pass "https://..." for TLS; ensure
DialAPIClient (and any callers) expect a full URL and optionally validate that
addr has a valid scheme (http or https) before calling ethrpc.DialContext to
avoid silently malformed URLs.
| func (rc *RetryableClient) NewL2BlockV2(ctx context.Context, executableL2Data *catalyst.ExecutableL2Data, isSafe bool) (err error) { | ||
| if retryErr := backoff.Retry(func() error { | ||
| respErr := rc.authClient.NewL2BlockV2(ctx, executableL2Data, isSafe) | ||
| if respErr != nil { | ||
| rc.logger.Error("NewL2BlockV2 failed", | ||
| "block_number", executableL2Data.Number, | ||
| "isSafe", isSafe, | ||
| "error", respErr) | ||
| if retryableError(respErr) { | ||
| return respErr | ||
| } | ||
| err = respErr | ||
| } | ||
| return nil | ||
| }, rc.b); retryErr != nil { | ||
| return retryErr |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
github.com/cenkalti/backoff/v4 ExponentialBackOff Reset before reuse concurrent use thread safe
💡 Result:
The ExponentialBackOff implementation in the cenkalti/backoff library (including v4 and v5) is explicitly not thread-safe [1][2][3]. The library maintainer has explicitly declined requests to add internal mutexes to the ExponentialBackOff struct, stating a preference to keep the library minimal and requiring users to handle thread safety externally if needed [4]. Key points regarding concurrent use and resetting: 1. Thread Safety: Because the implementation is not thread-safe, you must not access or modify an ExponentialBackOff instance (including calling NextBackOff or Reset) from multiple goroutines simultaneously [5][4]. 2. Resetting: The Reset method is intended to restart the backoff timer and interval [1]. Like other methods on the struct, it is not thread-safe and must be protected by external synchronization (such as a sync.Mutex) if the instance is shared across goroutines [5][4]. 3. Recommended Approach: If you require thread-safe behavior, you should wrap the ExponentialBackOff instance with your own synchronization logic (e.g., a mutex) in your application code [4]. Alternatively, for many common use cases, the library recommends using the provided Retry function, which manages the backoff lifecycle internally [6][7].
Citations:
- 1: https://ref.gotd.dev/pkg/github.com/cenkalti/backoff/v4.html
- 2: https://chromium.googlesource.com/external/github.com/cenkalti/backoff/+/2146c93394225c3732078705043ce9f26584d334/exponential.go
- 3: https://github.com/cockroachdb/vendored/blob/master/github.com/cenkalti/backoff/exponential.go
- 4: Make Exponentional Backoff thread safe cenkalti/backoff#66
- 5: Data Race cenkalti/backoff#47
- 6: https://pkg.go.dev/github.com/cenkalti/backoff/v4?tab=versions
- 7: https://github.com/cenkalti/backoff
🏁 Script executed:
# Find all wrapper methods in RetryableClient that use rc.b
rg -A 5 'func \(rc \*RetryableClient\)' node/types/retryable_client.go | head -100Repository: morph-l2/morph
Length of output: 4454
🏁 Script executed:
# Check how RetryableClient methods are called in the codebase
rg 'rc\.' node/types/retryable_client.go | wc -lRepository: morph-l2/morph
Length of output: 61
🏁 Script executed:
# Find usages of RetryableClient in other files
rg -l 'RetryableClient' node/ --type go | head -20Repository: morph-l2/morph
Length of output: 192
🏁 Script executed:
# Check if RetryableClient methods are called in concurrent contexts
rg -B 2 -A 2 'go.*rc\.' node/ --type go | head -50Repository: morph-l2/morph
Length of output: 40
🏁 Script executed:
# Look for how RetryableClient instance is passed and used in executors/sequencers
rg 'NewRetryableClient\|retryClient' node/sequencer/tm_node.go -A 3 -B 3Repository: morph-l2/morph
Length of output: 40
🏁 Script executed:
# Check for concurrent invocations in derivation
rg '\brc\.' node/derivation/derivation.go | head -20Repository: morph-l2/morph
Length of output: 40
🏁 Script executed:
# Get more context on how retryClient is used in the codebase
rg 'retryClient' node/sequencer/ -B 1 -A 1 --type goRepository: morph-l2/morph
Length of output: 40
🏁 Script executed:
# Check executor.go for retryClient usage
rg 'retryClient\|rc' node/core/executor.go -B 1 -A 1 --type go | head -40Repository: morph-l2/morph
Length of output: 40
🏁 Script executed:
# Look at the actual RetryableClient struct definition and initialization
cat -n node/types/retryable_client.go | head -50Repository: morph-l2/morph
Length of output: 2173
🏁 Script executed:
# Read the NewRetryableClient function
sed -n '51,70p' node/types/retryable_client.goRepository: morph-l2/morph
Length of output: 887
🏁 Script executed:
# Check how RetryableClient is instantiated and stored (look for fields in parent structs)
rg 'retryClient|RetryableClient' node/sequencer/ node/core/ node/derivation/ node/blocktag/ -B 2 -A 2 --type go | head -80Repository: morph-l2/morph
Length of output: 2712
🏁 Script executed:
# Search for concurrent method calls on retryClient or similar patterns
rg 'go.*\(.*\)|goroutine|spawn' node/sequencer/tm_node.go -B 2 -A 2 --type go | head -50Repository: morph-l2/morph
Length of output: 40
🏁 Script executed:
# Look for goroutine spawning in modules that use RetryableClient
rg 'go func' node/sequencer/ node/core/ node/derivation/ node/blocktag/ --type go | head -30Repository: morph-l2/morph
Length of output: 145
🏁 Script executed:
# Check if the Executor or other structs spawn goroutines that call l2Client methods
rg -A 5 'func.*l2Client' node/core/executor.go --type go | head -40Repository: morph-l2/morph
Length of output: 40
🏁 Script executed:
# Look for concurrent calls to l2Client methods in derivation
rg 'l2Client\.' node/derivation/derivation.go --type go -B 1 -A 1 | head -50Repository: morph-l2/morph
Length of output: 689
🏁 Script executed:
# Get full context of the goroutines in derivation.go
rg -B 10 -A 10 'go func' node/derivation/derivation.go --type go | head -100Repository: morph-l2/morph
Length of output: 1237
🏁 Script executed:
# Check if multiple goroutines access l2Client concurrently
cat -n node/derivation/derivation.go | grep -A 30 'go func'Repository: morph-l2/morph
Length of output: 2142
🏁 Script executed:
# Check if there's any synchronization (mutex) around rc.b or the RetryableClient
rg 'sync.Mutex|Lock|Unlock' node/types/retryable_client.go --type goRepository: morph-l2/morph
Length of output: 40
🏁 Script executed:
# Verify: the same retryClient instance is used in a loop calling multiple methods
sed -n '149,170p' node/derivation/derivation.goRepository: morph-l2/morph
Length of output: 420
🏁 Script executed:
# Check derivationBlock method to see all l2Client calls in the main loop
rg -A 50 'func.*derivationBlock' node/derivation/derivation.go | head -80Repository: morph-l2/morph
Length of output: 1897
🏁 Script executed:
# Check if this PR actually implements the fix or just adds NewL2BlockV2
# Look at the full retryable_client.go to see the struct definition
wc -l node/types/retryable_client.goRepository: morph-l2/morph
Length of output: 93
🏁 Script executed:
# Check the end of the file to see if there's a fix for the backoff reuse
tail -50 node/types/retryable_client.goRepository: morph-l2/morph
Length of output: 1794
🏁 Script executed:
# Verify the PR only added NewL2BlockV2 without fixing the shared backoff issue
git diff --stat 2>/dev/null | grep retryable_client || rg 'newRetryBackoff\|fresh' node/types/retryable_client.go --type goRepository: morph-l2/morph
Length of output: 40
Use a fresh backoff per RPC call.
This new wrapper still reuses the shared rc.b field. Since github.com/cenkalti/backoff/v4's ExponentialBackOff is not thread-safe and RetryableClient is shared across goroutines (e.g., in Derivation.Start()), concurrent calls to different methods can corrupt retry state and interfere with each other's backoff schedules. Create and reset a fresh backoff for each retry attempt instead of reusing the shared field.
Suggested approach
type RetryableClient struct {
authClient *authclient.Client
ethClient *ethclient.Client
- b backoff.BackOff
logger tmlog.Logger
}
+func newRetryBackoff() backoff.BackOff {
+ bo := backoff.NewExponentialBackOff()
+ bo.MaxElapsedTime = GethRetryMaxElapsedTime
+ bo.Reset()
+ return bo
+}
+
func NewRetryableClient(authClient *authclient.Client, ethClient *ethclient.Client, logger tmlog.Logger) *RetryableClient {
logger = logger.With("module", "retryClient")
- bo := backoff.NewExponentialBackOff()
- bo.MaxElapsedTime = GethRetryMaxElapsedTime
return &RetryableClient{
authClient: authClient,
ethClient: ethClient,
- b: bo,
logger: logger,
}
}Then swap rc.b for newRetryBackoff() in each wrapper, including NewL2BlockV2.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@node/types/retryable_client.go` around lines 120 - 135, The wrapper
NewL2BlockV2 on RetryableClient is using the shared rc.b backoff which is not
thread-safe; create a fresh backoff per RPC by calling newRetryBackoff() (or
otherwise constructing/resetting a new backoff) inside NewL2BlockV2 and pass
that to backoff.Retry instead of rc.b; update the call site in NewL2BlockV2 to
use the fresh backoff and apply the same pattern to the other RetryableClient
wrapper methods so each RPC uses its own independent backoff instance.
| leader_height_before=$(get_block_number "$HA_RPC_NODE0") | ||
| follower_height_before=$(get_block_number "$L2_RPC_NODE0") | ||
| sleep 10 | ||
| leader_height_after=$(get_block_number "$HA_RPC_NODE0") | ||
| follower_height_after=$(get_block_number "$L2_RPC_NODE0") | ||
|
|
||
| local follower_delta=$((follower_height_after - follower_height_before)) | ||
| local gap=$((leader_height_after - follower_height_after)) | ||
| if [ "$follower_delta" -ge 1 ] && [ "$gap" -lt 10 ]; then |
There was a problem hiding this comment.
TC-P2P-01 is reading block height from the HA admin RPC, not the leader geth RPC.
HA_RPC_NODE0 is the admin endpoint, so eth_blockNumber here returns no L2 height. That makes gap invalid and can let the test pass even when the fullnode is far behind. It also hard-codes node 0 instead of the current leader.
Suggested fix
- leader_height_before=$(get_block_number "$HA_RPC_NODE0")
+ local leader_rpc leader_geth_rpc
+ leader_rpc=$(find_leader_rpc)
+ leader_geth_rpc=$(ha_rpc_to_geth_rpc "$leader_rpc")
+ leader_height_before=$(get_block_number "$leader_geth_rpc")
follower_height_before=$(get_block_number "$L2_RPC_NODE0")
sleep 10
- leader_height_after=$(get_block_number "$HA_RPC_NODE0")
+ leader_height_after=$(get_block_number "$leader_geth_rpc")
follower_height_after=$(get_block_number "$L2_RPC_NODE0")🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 1412-1412: leader_height_before appears unused. Verify use (or export if used externally).
(SC2034)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ops/docker-sequencer-test/run-ha-test.sh` around lines 1412 - 1420,
leader_height_before/after are being fetched from HA_RPC_NODE0 (the admin RPC)
which doesn't return L2 blockNumber; replace the HA admin RPC with the current
leader's geth RPC when calling get_block_number so leader_height_* reflects the
leader fullnode L2 height, and stop hard-coding node 0 — use the variable that
holds the current leader RPC endpoint (the same leader identifier used elsewhere
in the script) so gap = leader_height_after - follower_height_after is valid;
update the two calls to get_block_number that set leader_height_before and
leader_height_after accordingly.
| grep 'produceBlock' "$logfile" | awk ' | ||
| { | ||
| build=0; sign=0; commit=0; total=0; tx=0; gas=0 | ||
| for(i=1;i<=NF;i++) { | ||
| if($i ~ /build_ms=/) { split($i,a,"="); build=a[2]+0 } | ||
| if($i ~ /sign_ms=/) { split($i,a,"="); sign=a[2]+0 } | ||
| if($i ~ /raft_commit_ms=/) { split($i,a,"="); commit=a[2]+0 } | ||
| if($i ~ /apply_ms=/) { split($i,a,"="); commit=a[2]+0 } | ||
| if($i ~ /total_ms=/) { split($i,a,"="); total=a[2]+0 } | ||
| if($i ~ /txCount=/) { split($i,a,"="); tx=a[2]+0 } | ||
| if($i ~ /gasUsed=/) { split($i,a,"="); gas=a[2]+0 } | ||
| } | ||
| n++; s_build+=build; s_sign+=sign; s_commit+=commit; s_total+=total; s_tx+=tx; s_gas+=gas | ||
| if(build>max_build) max_build=build | ||
| if(commit>max_commit) max_commit=commit | ||
| if(total>max_total) max_total=total | ||
| if(n==1 || build<min_build) min_build=build | ||
| if(n==1 || commit<min_commit) min_commit=commit | ||
| if(n==1 || total<min_total) min_total=total | ||
| } | ||
| END { | ||
| if(n>0) { | ||
| printf " %-18s avg=%-10.2f min=%-10.2f max=%.2f\n", "build_ms:", s_build/n, min_build, max_build | ||
| printf " %-18s avg=%-10.2f min=%-10.2f max=%.2f\n", "sign_ms:", s_sign/n, 0, 0 | ||
| printf " %-18s avg=%-10.2f min=%-10.2f max=%.2f\n", "raft_commit_ms:", s_commit/n, min_commit, max_commit | ||
| printf " %-18s avg=%-10.2f min=%-10.2f max=%.2f\n", "total_ms:", s_total/n, min_total, max_total | ||
| printf " %-18s avg=%.1f\n", "txCount:", s_tx/n | ||
| printf " %-18s avg=%.0f\n", "gasUsed:", s_gas/n | ||
| } | ||
| }' | ||
| fi |
There was a problem hiding this comment.
Bug: apply_ms clobbers raft_commit_ms in the produceBlock awk parser.
Lines 283 and 284 both assign to the same accumulator (commit):
if($i ~ /raft_commit_ms=/) { split($i,a,"="); commit=a[2]+0 }
if($i ~ /apply_ms=/) { split($i,a,"="); commit=a[2]+0 }Whichever field appears later on the log line wins, so the summary printed as raft_commit_ms: (line 301) actually reports apply_ms (or a mix, depending on field order). apply_ms is also never surfaced in its own row even though the parser pretends to read it. This silently corrupts one of the headline metrics the script exists to produce.
🐛 Proposed fix: track `apply` separately and print it
{
- build=0; sign=0; commit=0; total=0; tx=0; gas=0
+ build=0; sign=0; commit=0; apply=0; total=0; tx=0; gas=0
for(i=1;i<=NF;i++) {
if($i ~ /build_ms=/) { split($i,a,"="); build=a[2]+0 }
if($i ~ /sign_ms=/) { split($i,a,"="); sign=a[2]+0 }
if($i ~ /raft_commit_ms=/) { split($i,a,"="); commit=a[2]+0 }
- if($i ~ /apply_ms=/) { split($i,a,"="); commit=a[2]+0 }
+ if($i ~ /apply_ms=/) { split($i,a,"="); apply=a[2]+0 }
if($i ~ /total_ms=/) { split($i,a,"="); total=a[2]+0 }
if($i ~ /txCount=/) { split($i,a,"="); tx=a[2]+0 }
if($i ~ /gasUsed=/) { split($i,a,"="); gas=a[2]+0 }
}
- n++; s_build+=build; s_sign+=sign; s_commit+=commit; s_total+=total; s_tx+=tx; s_gas+=gas
+ n++; s_build+=build; s_sign+=sign; s_commit+=commit; s_apply+=apply; s_total+=total; s_tx+=tx; s_gas+=gas
if(build>max_build) max_build=build
if(commit>max_commit) max_commit=commit
if(total>max_total) max_total=total
if(n==1 || build<min_build) min_build=build
if(n==1 || commit<min_commit) min_commit=commit
if(n==1 || total<min_total) min_total=total
}
END {
if(n>0) {
printf " %-18s avg=%-10.2f min=%-10.2f max=%.2f\n", "build_ms:", s_build/n, min_build, max_build
- printf " %-18s avg=%-10.2f min=%-10.2f max=%.2f\n", "sign_ms:", s_sign/n, 0, 0
+ printf " %-18s avg=%-10.2f\n", "sign_ms:", s_sign/n
printf " %-18s avg=%-10.2f min=%-10.2f max=%.2f\n", "raft_commit_ms:", s_commit/n, min_commit, max_commit
+ printf " %-18s avg=%-10.2f\n", "apply_ms:", s_apply/n
printf " %-18s avg=%-10.2f min=%-10.2f max=%.2f\n", "total_ms:", s_total/n, min_total, max_total
printf " %-18s avg=%.1f\n", "txCount:", s_tx/n
printf " %-18s avg=%.0f\n", "gasUsed:", s_gas/n
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ops/docker-sequencer-test/run-perf-test.sh` around lines 277 - 307, The
produceBlock awk parser currently writes apply_ms into the same variable as
raft_commit_ms (both assign to commit), corrupting raft_commit_ms; update the
parser to parse apply_ms into its own variable (e.g., apply), add accumulators
s_apply and trackers max_apply/min_apply alongside the existing
s_commit/max_commit/min_commit, increment s_apply and update max_apply/min_apply
in the per-line block, and add a printf row for "apply_ms:" in the END block
(similar to raft_commit_ms:) so apply_ms is reported separately while
raft_commit_ms remains correct.
| try: | ||
| run_command([ | ||
| 'cast', 'send', l1_sequencer_addr, | ||
| 'initializeHistory(address,uint64)', | ||
| sequencer_addr, str(upgrade_height), | ||
| '--rpc-url', 'http://127.0.0.1:9545', | ||
| '--private-key', deployer_pk | ||
| ]) | ||
| log.info('L1Sequencer history initialized successfully') | ||
| except Exception as e: | ||
| log.info(f'L1Sequencer initializeHistory failed (may already be initialized): {e}') |
There was a problem hiding this comment.
Don't ignore initializeHistory failures on a fresh deployment.
This script redeploys Proxy__L1Sequencer just above, so “may already be initialized” should not be the normal case here. Catching every exception and continuing can start the rest of the test with no sequencer history and hide the real setup failure.
Suggested direction
- except Exception as e:
- log.info(f'L1Sequencer initializeHistory failed (may already be initialized): {e}')
+ except Exception as e:
+ raise RuntimeError(f'L1Sequencer initializeHistory failed: {e}') from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| run_command([ | |
| 'cast', 'send', l1_sequencer_addr, | |
| 'initializeHistory(address,uint64)', | |
| sequencer_addr, str(upgrade_height), | |
| '--rpc-url', 'http://127.0.0.1:9545', | |
| '--private-key', deployer_pk | |
| ]) | |
| log.info('L1Sequencer history initialized successfully') | |
| except Exception as e: | |
| log.info(f'L1Sequencer initializeHistory failed (may already be initialized): {e}') | |
| try: | |
| run_command([ | |
| 'cast', 'send', l1_sequencer_addr, | |
| 'initializeHistory(address,uint64)', | |
| sequencer_addr, str(upgrade_height), | |
| '--rpc-url', 'http://127.0.0.1:9545', | |
| '--private-key', deployer_pk | |
| ]) | |
| log.info('L1Sequencer history initialized successfully') | |
| except Exception as e: | |
| raise RuntimeError(f'L1Sequencer initializeHistory failed: {e}') from e |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ops/docker-sequencer-test/run-test.sh` around lines 215 - 225, The try/except
around the initializeHistory call is swallowing all errors and masking real
failures; change it so initializeHistory failures on a fresh deploy surface:
either remove the broad except and let exceptions propagate from
run_command('initializeHistory(address,uint64)', ...) so the script fails on
real errors, or catch the exception, inspect the error text returned by
run_command and only suppress it if it unmistakably indicates "already
initialized" (otherwise re-raise); update the block around run_command /
initializeHistory / L1Sequencer to log and then raise for unexpected errors so
the test cannot continue without a valid sequencer history.
| build_malicious_image() { | ||
| log_info "Building malicious node image from test/p2p-security branch..." | ||
| cd "$BITGET_ROOT" | ||
|
|
There was a problem hiding this comment.
build-malicious uses an undefined repo root.
The script defines MORPH_ROOT and POLYREPO_ROOT, but not BITGET_ROOT, so this command exits on the first cd before it builds anything.
Suggested fix
build_malicious_image() {
log_info "Building malicious node image from test/p2p-security branch..."
- cd "$BITGET_ROOT"
+ cd "$POLYREPO_ROOT"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| build_malicious_image() { | |
| log_info "Building malicious node image from test/p2p-security branch..." | |
| cd "$BITGET_ROOT" | |
| build_malicious_image() { | |
| log_info "Building malicious node image from test/p2p-security branch..." | |
| cd "$POLYREPO_ROOT" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ops/docker-sequencer-test/run-test.sh` around lines 530 - 533, The
build_malicious_image function uses an undefined BITGET_ROOT which causes cd to
fail; fix by either initializing BITGET_ROOT from an existing repo variable or
using that existing variable directly. For example, set
BITGET_ROOT="${BITGET_ROOT:-$POLYREPO_ROOT}" near script initialization or
replace cd "$BITGET_ROOT" inside build_malicious_image with cd "$POLYREPO_ROOT"
(or cd "$MORPH_ROOT" if that is the intended repo), referencing
build_malicious_image, BITGET_ROOT, POLYREPO_ROOT and MORPH_ROOT to locate the
change.
| # Save current tendermint branch state | ||
| cd tendermint | ||
| local original_branch | ||
| original_branch=$(git branch --show-current) | ||
| git stash 2>/dev/null || true | ||
|
|
||
| # Switch to malicious branch | ||
| git checkout test/p2p-security | ||
| cd "$BITGET_ROOT" | ||
|
|
||
| # Build using same Dockerfile, different tag | ||
| docker build -t morph-node-malicious:latest \ | ||
| -f morph/ops/docker-sequencer-test/Dockerfile.l2-node-test . | ||
|
|
||
| # Switch back | ||
| cd tendermint | ||
| git checkout "$original_branch" | ||
| git stash pop 2>/dev/null || true | ||
| cd "$BITGET_ROOT" |
There was a problem hiding this comment.
Always restore the tendermint checkout.
This function stashes changes and checks out test/p2p-security, but with set -e any failing git checkout or docker build leaves the repo on the wrong branch with a pending stash. Restore in a trap or use a separate worktree.
Suggested hardening
build_malicious_image() {
log_info "Building malicious node image from test/p2p-security branch..."
- cd "$BITGET_ROOT"
+ cd "$POLYREPO_ROOT"
# Save current tendermint branch state
cd tendermint
local original_branch
original_branch=$(git branch --show-current)
git stash 2>/dev/null || true
+ trap 'git checkout "$original_branch" >/dev/null 2>&1 || true; git stash pop 2>/dev/null || true' RETURN
# Switch to malicious branch
git checkout test/p2p-security
- cd "$BITGET_ROOT"
+ cd "$POLYREPO_ROOT"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ops/docker-sequencer-test/run-test.sh` around lines 534 - 552, The script
currently stashes local changes, checks out test/p2p-security, and runs docker
build but does not guarantee cleanup on failure; update the block that sets
original_branch, runs git stash, git checkout test/p2p-security and docker build
to always restore the original branch and stash state by installing a trap-based
cleanup (or use a separate git worktree) that on EXIT checks if original_branch
is set and then runs git checkout "$original_branch" and git stash pop (if stash
was created), ensuring the trap is registered immediately after computing
original_branch and before any potentially failing command (refer to the
original_branch variable, git stash, git checkout test/p2p-security and docker
build -t morph-node-malicious:latest).
Summary by CodeRabbit
New Features
Tests
Chores