fix(tmpnetjs): fail fast with actionable errors on boot misconfigurations#4
Merged
Conversation
…ions Three failure modes previously burned minutes of timeout with the real cause buried in a node log or nowhere at all: - Missing local staking keys: nodes booted with ephemeral certs, were not genesis validators, and P-Chain bootstrap timed out reporting "undefined". Now refuses to boot up front, listing the dirs tried and the AVALANCHEGO_STAKING_KEYS_DIR fix; a partial key set also errors instead of silently downgrading one node to a non-validator. - RPCChainVM protocol mismatch between avalanchego and the subnet-evm plugin: the L1 RPC 404'd for the full 3-minute timeout while the handshake error sat in the node log. New preflight compares both binaries' protocol versions (best-effort --version parsing) and refuses to boot on mismatch; the L1-RPC timeout errors now also attach the "error creating chain" line scanned from the node logs. - Booting over a half-dead previous run: stale nodes held ports and a late reaper could kill the new network's nodes. up() now refuses when the pid file records live processes or node ports are taken. Preflight failures throw PreflightError and skip the reap-on-failure path — nothing was spawned, and reaping would kill the previous (possibly healthy) network the error is telling the user about. Also: waitForBootstrap/waitForNodeID timeouts now report the last RPC response (not "undefined"), with a pointer to the /ext/health signal; startPrimaryNetwork logs the resolved binary + staking keys paths.
Addresses the CodeQL finding on PR #4: execSync interpolated the env-derived binary path into a shell command string. execFileSync invokes the binary directly with no shell, so the path is only ever an argv entry.
ashucoder9
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Three boot misconfigurations each burned minutes of timeout with the real cause buried in a node log — or reported nowhere at all:
Timed out waiting for P-Chain bootstrap: undefined. The only signal (bls: node is not a validator) sits in/ext/health.error creating chain ... handshake failedsits inmyl1-rpc-*.log.bind: address already in useminutes in), and a late-running reaper from the oldupcan kill the new network's nodes.Changes
up()(internal/preflight.ts):--version-json) and the subnet-evm plugin (--version), best-effort, and refuse on mismatch with both versions and paths in the messageresolveStakingKeysDirthrows (listing dirs tried + theAVALANCHEGO_STAKING_KEYS_DIRfix) instead of silently falling back to ephemeral certs that can never bootstrap; a partial key set errors per-node instead of downgrading one node to a non-validator.PreflightErrorskips the reap-on-failure path — nothing was spawned, and reaping would kill the previous (possibly healthy) network the error is telling the user about. (Found live: the first version of the stale-network check reaped the very network it refused to boot over.)error creating chainline for that blockchain ID and append it (internal/diagnose.ts).waitForBootstrap/waitForNodeIDreport the last RPC response instead ofundefined, with a pointer to the/ext/healthsignal;startPrimaryNetworklogs the resolved binary + staking-keys paths up front.Verification
All on a real network (macOS, avalanchego v1.14.0 + subnet-evm v0.8.0):
upover a live network → instant refusal listing live pids, previous network untouchedundefined)upboots clean through L1 + ICM + relayer,[up] network ready