Skip to content

fix(browse): per-process state-file temp path to fix concurrent-write ENOENT#1310

Open
yashkot007 wants to merge 1 commit intogarrytan:mainfrom
yashkot007:fix/concurrent-state-write-tempfile-collision
Open

fix(browse): per-process state-file temp path to fix concurrent-write ENOENT#1310
yashkot007 wants to merge 1 commit intogarrytan:mainfrom
yashkot007:fix/concurrent-state-write-tempfile-collision

Conversation

@yashkot007
Copy link
Copy Markdown

@yashkot007 yashkot007 commented May 4, 2026

Summary

The daemon writes .gstack/browse.json via the standard atomic-rename pattern:

writeFileSync(tmp, …) → renameSync(tmp, stateFile)

Four sites in server.ts use this pattern, and all four hard-code the same temp filename ${stateFile}.tmp:

Line Context
2002 initial daemon-startup state write
1479 /tunnel/start handler state update
2083 BROWSE_TUNNEL=1 inline tunnel update
2113 BROWSE_TUNNEL_LOCAL_ONLY=1 update

Under concurrent writers the shared filename races on the rename:

t0  Writer A: writeFileSync(stateFile + '.tmp', payloadA)
t1  Writer B: writeFileSync(stateFile + '.tmp', payloadB)   // overwrites A
t2  Writer A: renameSync(stateFile + '.tmp', stateFile)    // moves B's payload
t3  Writer B: renameSync(stateFile + '.tmp', stateFile)    // ENOENT — file gone

The atomic rename was meant to protect the write, but it doesn't protect against two writers using the same temp filename. The second writer's renameSync throws ENOENT mid-air, killing one of the spawning daemons during startup.

How it surfaced

15 concurrent CLIs against a fresh .gstack/:

[browse] Failed to start: ENOENT: no such file or directory,
rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json'
Pre-fix Post-fix
Successful Status: healthy 0 / 15 15 / 15
ENOENT errors 1 0
Stale-state daemon (state file references dead PID) yes no

Repro: for i in 1..15; do ./browse/dist/browse status & done; wait

Fix

New helper tmpStatePath() returns a per-process unique temp path:

function tmpStatePath(): string {
  return `${config.stateFile}.tmp.${process.pid}.${crypto.randomBytes(4).toString('hex')}`;
}

All 4 call sites swap config.stateFile + '.tmp' for tmpStatePath(). The atomic rename still gives last-writer-wins semantics on the final state file content; the only behavior change is that concurrent writers no longer kill each other on the rename step.

crypto and fs are already imported. No new dependencies.

Regression test

browse/test/server-tmp-state-path.test.ts — source-level guard following terminal-agent.test.ts and dual-listener.test.ts (read source as text, assert invariant, no daemon required).

Locks three invariants:

  1. No remaining stateFile + '.tmp' literals in server.ts (regression catch — a future copy-paste or revert reintroducing the bug fails this)
  2. Every state-file writeFileSync call uses tmpStatePath() (positive coverage)
  3. tmpStatePath() body references both process.pid AND crypto.randomBytes (locks the uniqueness shape so a future "simplification" doesn't strip uniqueness back out)

Test plan

  • Targeted source-level guard test: 3 pass / 0 fail
  • bun run build — clean rebuild
  • Live regression: 15 concurrent CLIs against cold state → 15 / 15 healthy, 0 ENOENT (vs 0 / 15 on main)
  • No orphan .tmp.* files left behind after the rename completes
  • Related cluster (server-auth, dual-listener, cdp-mutex, findport): same pre-existing flakes as main — no new regressions introduced by this change

Related

Independent of #1309 (also from this stress-test run — flushBuffers undeclared variable). Both touch server.ts but at different sites; either can land first, the other rebases trivially.

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

… ENOENT

The daemon writes `.gstack/browse.json` via the standard atomic-rename
pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four
sites in server.ts use this pattern (initial daemon-startup state at
:2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel
update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all
four hard-code the same temp filename `${stateFile}.tmp`.

Under concurrent writers the shared filename races on the rename:

    t0  Writer A: writeFileSync(stateFile + '.tmp', payloadA)
    t1  Writer B: writeFileSync(stateFile + '.tmp', payloadB)   // overwrites A
    t2  Writer A: renameSync(stateFile + '.tmp', stateFile)    // moves B's payload
    t3  Writer B: renameSync(stateFile + '.tmp', stateFile)    // ENOENT — file gone

Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`:

    [browse] Failed to start: ENOENT: no such file or directory,
    rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json'

Pre-fix success rate: **0 / 15** under cold-start race.
Post-fix success rate: **15 / 15**, zero ENOENT.

Fix:
  - New `tmpStatePath()` helper (server.ts:333) returns
    `${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}`
  - All 4 call sites use `tmpStatePath()` instead of the shared literal
  - Atomic rename still gives last-writer-wins semantics on the final
    state.json content; only behavior change is that concurrent writers
    no longer kill each other on the rename step

Source-level guard test (browse/test/server-tmp-state-path.test.ts)
locks two invariants: (1) no remaining `stateFile + '.tmp'` literals,
(2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same
read-source-as-text pattern as terminal-agent.test.ts and
dual-listener.test.ts — no daemon required, runs in tier-1 free.

Test plan:
  - [x] Targeted source-level guard test passes (3 / 0)
  - [x] `bun run build` clean
  - [x] Live regression: 15 concurrent CLIs against cold state →
        15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix)
  - [x] No `.tmp.*` orphans left behind after rename succeeds
  - [x] Related test cluster (server-auth, dual-listener, cdp-mutex,
        findport) — same pre-existing flakes as `main`, no new
        regressions introduced

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant