Post-audit cleanup: lifecycle bugs, harness fidelity, examples, CLI, installer#2
Merged
Conversation
Initial pre-public-release versions. Adds Keep-a-Changelog-format CHANGELOG.md summarizing what ships in 0.1.0.
…ruption-safe
Previously two writers (spawnTracked vs the orchestrator's inline
writePidRecord) clobbered each other on pids.json, and down() only
SIGTERM'd parent PIDs — subnet-evm plugin children were left orphaned
under PPID=1 every cycle. Confirmed in the wild: 5 plugins from
4-day-old runs accumulated on a dev machine.
This commit:
- Consolidates on the richer internal/process.ts schema:
{ hash?, processes: [{ name, pid, pgid, kind, binary, logFile, startedAt }] }.
Records PIDs incrementally via spawnTracked instead of batching at the
end, so SIGINT during boot can reap what was spawned so far.
- Spawns every long-running process detached so it becomes a process
group leader; kills via process.kill(-pgid, ...) with SIGTERM → 3s
grace → SIGKILL. Adds a defensive ps-based sweep filtered by VM ID
AND workDir for any leaked plugin children.
- Installs SIGINT/SIGTERM handlers in up() and wraps the boot in
try/catch so a half-finished up still tears down cleanly.
- Makes installFeeStatePatch reversible — exports uninstallFeeStatePatch
and calls it from every teardown path so embedders (Vitest setups
etc.) don't end up with a permanently wrapped globalThis.fetch.
- Adds stage prints throughout up() so the previously-silent 2-minute
cold-boot phase ([primary] booting, P-Chain bootstrap, CreateSubnet,
CreateChain, ConvertSubnetToL1Tx) is visible.
- Strips $HOME/code/avalanchego and $HOME/code/avalanche-benchmark
fallback paths from spawn.ts and l1/create.ts — those leaked a
developer's personal layout into a public repo.
- Makes deployIcmStack idempotent: checks the canonical
(deployer, nonce=0) address first and reuses if Teleporter is already
there (covers the snapshot-restore path, where C-Chain state from a
prior run already has Teleporter at the universal-deployer address).
Defensive guard: if no code at the canonical address but deployer
nonce > 0, abort with a "run pnpm run clean" hint instead of landing
Teleporter at a non-canonical address.
- On stale-snapshot rejection (validateSnapshot returns ok=false), wipe
the data dir too. Otherwise residual C-Chain state from the rejected
run leaves the Teleporter deployer with a non-zero nonce and breaks
the universal-deployer invariant the relayer depends on.
- cli.ts: implement \`tmpnetjs status\` (was returning "not yet
implemented"). Wire \`--fresh\` through to up({fresh:true}) (was
parsed then discarded). Replace the misleading \`--keep-snapshot\`
flag (default behavior, no-op) with the actually-destructive
\`--delete-snapshot\`. Rewrote USAGE to match.
- snapshot.ts: write a snapshot.meta.json sidecar alongside the
tarball (configHash, avalanchegoBinary fingerprint via mtime+size,
captured-at timestamp). validateSnapshot returns {ok, reason} so
callers can fall through to cold boot on mismatch instead of
silently restoring stale state across an avalanchego upgrade.
- types.ts + internal/config.ts: introduce NetworkTimeouts so slow
CI runners can override the magic values in validator-set.ts
(postAdvance, epoch, sigagg health, l1 RPC). Defaults preserved.
- internal/config.ts: paths(workDir) now exposes relayerConfigPath,
sigaggConfigPath, and the sibling icmRelayerStorageDir — \`clean()\`
wipes all three, fixing the leftover-config-on-clean issue.
- Root README: drop the "17 harness tests" claim (actual is 18 after
this PR adds fee + universal-deployer tests).
…validator teardown - transfer-token.ts: the DemoERC20 was deployed with decimals()=6 but ERC20TokenHome was told tokenHomeDecimals=18 and parseUnits used 18. A user copying the scaffold with a real 6-decimal token (USDC) would silently send 10^12× their intended amount. Fix: deploy DemoERC20 with 18 decimals explicitly, rename the token to "Demo Token"/"DEMO" so the symbol stops lying. - send-message.ts + transfer-token.ts: accept process.env.DESTINATION and AMOUNT as fallbacks (flag > env > default). The examples/README documented env-var invocations that didn't actually work. - examples/README.md: fix the `ExampleERC20` -> `DemoERC20` reference and document the flag/env precedence. - add-validator.ts: the script previously told users to clean up with `pkill -9 avalanchego`, which nukes the entire tmpnet. The script already knows the spawned child's PID — print it and the log path instead.
…e admin entrypoints Three harness fidelity gaps a previous audit flagged. Each gap let a test pass dishonestly while a real chain would behave differently. 1. Universal-deployer assumption was structurally bypassed. MockWarpPrecompile.getVerifiedWarpMessage used to return originSenderAddress=msg.sender unconditionally, so Teleporter's "require(originSenderAddress == address(this))" check at TeleporterMessenger:259-262 always passed. A user with a custom factory deploying messengers at different addresses got false greens. Fix: Inflight tracks the source messenger; the harness records canonicalMessenger[] for whatever it deployed via deployChain, and for any source outside that set the mock reports the actual sender so the real Teleporter check fires. New test_rogueSourceMessenger_universalDeployerCheck_reverts confirms. 2. Fees > 0 path was untested. Every example used feeInfo.amount=0, missing safeTransferFrom + redeemRelayerRewards entirely. Adds contracts/src/examples/icm-basics/FeeMessage.sol (reuses DemoERC20) and a test that asserts the fee actually moves and redeemRelayerRewards credits the harness as relayer. 3. _harness_* admin entrypoints on MockWarpPrecompile were unguarded. Any contract under test could forge messages mid-test. Gated via a one-shot _harness_init(harness_) + onlyHarness modifier; FoundryWarpHarness sets itself as the harness at construction. test_unauthorisedHarnessAdmin_reverts proves the gate. forge test: 14 → 18 passed.
…atch
icm-services-installer used to swallow every checksum-related error
("if checksums.txt 404s, continue without verification — per spec").
The releases this installer pins (icm-relayer v1.7.5,
signature-aggregator v0.5.4) do publish checksums.txt, so make
verification required:
- checksums.txt 404, parse-miss, missing-entry, and hash mismatch all
throw with actionable errors that name the override env var.
- INTERCHAIN_KIT_SKIP_CHECKSUM=1 reverts to the prior best-effort
behavior, emitting a loud console.warn each time it's used.
- Refactored InstallOptions to accept an optional download injector so
the new tests can exercise 404, missing-entry, and bypass paths
without network. Exported Downloader type for those tests.
Test coverage: 11 → 21. New cases cover findChecksum
(present/absent/case-mismatch/malformed), verifySha256
(happy/mismatch), and three installer paths through the injector.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end cleanup pass driven by a production-readiness audit before flipping the repo public. Six commits, partitioned by domain:
v0.1.0, addCHANGELOG.mdNetworkTimeoutsAPItransfer-tokendecimals mismatch, env-var fallbacks, saferadd-validatorteardown_harness_*adminWhat this fixes
Process lifecycle (was the biggest sharp edge)
The previous orchestrator had two PID writers with incompatible schemas clobbering each other, and
down()only SIGTERM'd parent PIDs. The subnet-evm plugin children were orphaned under PID 1 every cycle — a dev machine accumulated 5 plugins from 4-day-old runs.{ hash?, processes: [{ name, pid, pgid, kind, ... }] }detached: true, kills the entire process group (kill(-pgid, ...)) withSIGTERM → 3s → SIGKILLps-based sweep keyed on the subnet-evm VM ID and the workDir (so a sweep in one workspace doesn't kill another)up()records PIDs incrementally and installsSIGINT/SIGTERMhandlers — Ctrl-C during boot now tears down cleanlyUniversal-deployer correctness on re-up
deployIcmStackpreviously assumed the Teleporter deployer (Anvil#1) had nonce=0 on every chain at deploy time. Afterdown → up(snapshot path), C-Chain state was restored with the deployer at nonce 2 from the previous deploy, but L1 was recreated fresh at nonce 0. Result: Teleporter at different addresses on the two chains, breaking the universal-deployer invariant the relayer requires. Now:(deployer, nonce=0)address — reuse if Teleporter is already therepnpm run clean" hint instead of silently producing a broken networkp.dataso the fresh cold boot starts truly cleanCLI no longer lies
tmpnetjs statuswas advertised in--helpbut returned "not yet implemented" — now implements it, tolerant of both old and new pid schemastmpnetjs up --freshwas parsed then explicitly discarded (void fresh;) — now wired toup({ fresh: true })tmpnetjs down --keep-snapshotwas misleading (default behavior, no-op); the actual destructive flag is now--delete-snapshotup()— the previously-silent first 2 minutes of cold boot is now visibleHarness fidelity gaps closed
Three places where a test could pass dishonestly while a real chain would behave differently:
MockWarpPrecompile.getVerifiedWarpMessage— fixed by tracking per-messenger origin;test_rogueSourceMessenger_universalDeployerCheck_revertsconfirmsFeeMessage.sol+ test exercising realsafeTransferFrom+redeemRelayerRewards_harness_*admin entrypoints were unguarded — any contract under test could forge messages mid-test; now gated by a one-shot_harness_init(harness_)+onlyHarnessmodifierforge test: 14 → 18 passed.Examples bugs
transfer-token.tsdeployed adecimals()=6token but parsed amounts at 18 decimals — a user copying the scaffold with real 6-decimal USDC would have shipped a 10^12× over-send. Token is now consistently 18 decimals and renamed toDEMO/Demo Token.examples/README.mddocumented env vars that didn't work and aExampleERC20contract name that doesn't exist — now matches the code (flag > env > default precedence).add-validator.tstold users to clean up withpkill -9 avalanchego, which nukes the whole tmpnet. Now prints the actual spawned PID + log path.Supply chain (installer)
INTERCHAIN_KIT_SKIP_CHECKSUM=1is the explicit escape hatch, with a loudconsole.warnwhen useddownloadinjector that lets unit tests exercise 404, missing-entry, and bypass paths without networkVerification
pnpm install --frozen-lockfile: cleanpnpm -r --filter './packages/**' build: cleanpnpm --filter @interchain-kit/examples typecheck: cleanpnpm --filter @interchain-kit/icm-services-installer test: 21/21 passforge test --root contracts: 18/18 passforge fmt --check --root contracts: cleanpnpm run down: zero orphaned plugin processes (previously: 5+ orphans every cycle)Test plan
pnpm run cleanfirst should be fine post-this-PR)pnpm run downleaves no avalanchego/srEXi/icm-relayer/signature-aggregator processes alive