Skip to content

fix(core): preserve uid/gid in atomicWriteFile to avoid breaking shared-write files#4431

Open
doudouOUC wants to merge 9 commits into
QwenLM:mainfrom
doudouOUC:worktree-fix-atomic-write-ownership
Open

fix(core): preserve uid/gid in atomicWriteFile to avoid breaking shared-write files#4431
doudouOUC wants to merge 9 commits into
QwenLM:mainfrom
doudouOUC:worktree-fix-atomic-write-ownership

Conversation

@doudouOUC
Copy link
Copy Markdown
Collaborator

@doudouOUC doudouOUC commented May 22, 2026

Summary

atomicWriteFile uses write-to-tmp + rename for crash atomicity. POSIX rename creates a new inode owned by the calling process's euid/egid, so every Write/Edit/NotebookEdit through qwen-code silently strips the original uid/gid of any file it touches.

This breaks two real workflows reported against v0.16.0:

  1. Shared-write files. A group-writable file owned by another user in a shared workspace becomes owned by the qwen-code user; collaborators lose write access.
  2. Docker + host IDE. User runs qwen-code as root inside a Docker container against bind-mounted source. Host IDE runs as the host user. Every file qwen touches becomes root:root on the host, so the IDE outside the container can no longer write to it (reported with screenshot: "ide里面cmakefile和涉及到改动的源码,就无写权限了").

Both Write and Edit go through StandardFileSystemService.writeTextFile()atomicWriteFile, so switching tools doesn't dodge the bug. Introduced by #4096 in v0.16.0.

Fix

One uniform code path: when the existing file's uid/gid differs from the calling process's euid/egid, fall back to in-place writeFile (truncate the existing inode) instead of rename. This preserves uid/gid at the cost of crash atomicity for that specific case. Same path for root and non-root.

The root scenario originally had a "rename + chown back" optimization, but chown silently fails inside user-namespaced or CAP_CHOWN-stripped containers (i.e. exactly the Docker scenario above), which would re-introduce the bug. Routing root through the in-place fallback eliminates this failure mode and removes an untestable branch.

Windows is unaffected (no POSIX ownership).

Trade-offs (documented in the JSDoc)

The in-place fallback trades three properties for ownership preservation:

  • Crash atomicity — a crash mid-write can leave a partially-written file.
  • Concurrent reader isolation — readers can observe a zero-length or partial file during the write.
  • Watcher semantics — emits MODIFY rather than MOVED_TO/CREATE. Most watchers (chokidar, VSCode) handle both.
  • EACCES on unwritable files — if the file's mode forbids the current user from writing, the fallback now surfaces EACCES. Atomic rename used to silently replace files the user had no write permission on, which was arguably a privilege issue.

Test plan

  • In-place fallback on uid mismatch — verifies content updates, mode preserved (0o664), and inode unchanged (the signal that the fallback path ran instead of rename).
  • Same scenario triggered via gid mismatch.
  • Positive case — ownership matches → atomic rename → inode changes (existing semantics preserved).
  • EACCES path — read-only file + mocked mismatched uid → fallback hits EACCES cleanly, original content preserved.
  • All existing atomicFileWrite / write-file / edit / fileSystemService tests pass (190 tests).
  • Manual repro on a shared-write file (group-writable, owned by another user) — verify write succeeds and ownership/group preserved.
  • Manual repro inside Docker (qwen-code as root, host file owned by uid 1000) — verify host-side ls -l shows ownership preserved as uid:1000 gid:1000 after a Write.

🤖 Generated with Qwen Code

…ed-write files

atomicWriteFile uses write-to-tmp + rename for crash atomicity. POSIX
rename creates a new inode owned by the calling process's euid/egid, so
the rename silently strips the original uid/gid. On shared-write setups
(e.g. a group-writable file owned by another user in a shared workspace
where the current user has group-write access), every Write/Edit/
NotebookEdit through qwen-code would reset ownership to the running
user and effectively revoke write access for the original collaborators.

The fix:

1. If the target exists and is owned by a different uid/gid than the
   process's effective uid/gid (and we are not root), fall back to
   in-place writeFile. This truncates the existing inode in place,
   preserving uid/gid. The trade-off is loss of crash atomicity for
   this specific case — an acceptable trade for not silently breaking
   shared-write file ownership.

2. If running as root, atomic rename is still used, and ownership is
   restored via chown(uid, gid) after the rename. Root can chown back;
   non-root cannot, hence the in-place fallback for non-root.

3. Windows is unaffected (no POSIX ownership semantics).

Tests:

- New: in-place fallback on uid mismatch — verify content updates, mode
  preserved, and inode unchanged (the inode is the signal that the
  fallback path ran rather than rename).
- New: same scenario triggered via gid mismatch.
- New: positive case — ownership matches → atomic rename → inode changes.

Regression: a v0.16.0 user reported "every write turns a world-writable
file into one other users can no longer write." Bisected to QwenLM#4096 which
introduced atomicWriteFile + write-to-tmp + rename.
Copilot AI review requested due to automatic review settings May 22, 2026 07:15
@github-actions
Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR fixes a critical regression introduced in v0.16.0 (#4096) where atomicWriteFile silently strips file ownership (uid/gid) when writing to shared-workspace files owned by other users. The fix implements a smart fallback: non-root processes use in-place write for ownership-mismatched files (preserving uid/gid at the cost of crash atomicity), while root uses atomic rename + chown to restore ownership. The implementation is well-tested with comprehensive unit tests covering all scenarios.

🔍 General Feedback

  • Strong problem description: The PR body clearly explains the root cause (POSIX rename creates new inode owned by process euid/egid), the real-world impact (shared-workspace collaboration broken), and the trade-offs involved.
  • Defensive programming: Excellent use of feature detection (process.geteuid, process.getegid, platform checks) rather than assumptions about the runtime environment.
  • Test coverage: Three new test cases cover uid mismatch, gid mismatch, and the positive case (ownership matches → atomic rename). Tests correctly verify inode preservation as the signal for fallback path execution.
  • Documentation: Updated function JSDoc with detailed step 6 explaining the ownership preservation logic and trade-offs.
  • Root handling: Smart distinction between root (rename + chown back) and non-root (in-place fallback) scenarios.

🎯 Specific Feedback

🟢 Medium

  • packages/core/src/utils/atomicFileWrite.ts:147-150 — The ownershipWouldChange function checks euid === 0 to skip the fallback for root, but the root-specific chown logic at lines 215-225 runs after the rename. Consider adding a comment at line 148 explaining why root skips the fallback (because the post-rename chown will restore ownership, which is better than losing crash atomicity).

  • packages/core/src/utils/atomicFileWrite.ts:185 — The fallback path calls fs.writeFile(targetPath, data, writeOptions) directly. Consider extracting this into a named helper function (e.g., writeInPlace) to mirror the structure of the main atomic path and make the code more maintainable.

  • packages/core/src/utils/atomicFileWrite.ts:215-225 — The root chown restoration is wrapped in a try-catch that silently ignores failures with "Best-effort. Not all filesystems support chown." Consider adding a debug-level log here (if logging infrastructure exists) so operators can detect when ownership restoration fails on root invocations.

🔵 Low

  • packages/core/src/utils/atomicFileWrite.ts:143-163 — The comment block before ownershipWouldChange is excellent but quite long. Consider moving the detailed explanation into the JSDoc at the top of the function (step 6) and keeping just a one-liner here pointing to that documentation.

  • packages/core/src/utils/atomicFileWrite.test.ts:289 — The test comment says "Simulate a file owned by a different user by replacing process.geteuid" — consider adding a brief note explaining why this mocking approach is valid (the code checks ownership at runtime via process.geteuid(), so monkey-patching it correctly simulates the ownership mismatch scenario).

  • packages/core/src/utils/atomicFileWrite.test.ts:291 — Consider using vi.spyOn(process, 'geteuid') instead of direct monkey-patching if the project uses vitest's spying utilities consistently elsewhere. This would be more idiomatic and automatically restored after tests.

✅ Highlights

  • Excellent bug fix discipline: The fix correctly identifies that crash atomicity loss is an acceptable trade-off for not breaking shared-write file ownership — this is clearly documented and justified.
  • Comprehensive test strategy: Testing via inode comparison is clever and robust — it proves which code path executed without needing to mock fs operations.
  • Platform awareness: Proper handling of Windows (no POSIX ownership) and feature detection for Unix-specific functions.
  • No breaking changes: Existing semantics are preserved for the common case (ownership matches), and the fallback only activates when needed.
  • Clear regression tracking: PR correctly attributes the issue to feat(core,cli): add generic atomicWriteFile, wire into Write/Edit tools, upgrade @types/node #4096 in v0.16.0, making it easy to understand the historical context.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a regression in atomicWriteFile where write-to-tmp + rename can unintentionally change a file’s uid/gid (breaking shared/group-writable workflows) by introducing an ownership-preserving fallback path and validating it with new tests.

Changes:

  • Detect existing-file uid/gid mismatch (non-root) and fall back to in-place writeFile to preserve ownership/inode.
  • For root runs, keep atomic rename and restore original ownership via chown after rename.
  • Add unit tests asserting inode-change vs inode-preservation behavior for the new ownership logic.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
packages/core/src/utils/atomicFileWrite.ts Adds ownership-aware logic (non-root in-place fallback; root chown-back) while preserving existing permission behavior.
packages/core/src/utils/atomicFileWrite.test.ts Adds tests covering inode changes on atomic rename and inode preservation on ownership-mismatch fallback (uid and gid cases).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +222 to +226
try {
await fs.chown(targetPath, existingStat.uid, existingStat.gid);
} catch {
// Best-effort. Not all filesystems support chown.
}
Review follow-ups on the atomic-write ownership fix:

1. Remove the root-special-case (rename + post-rename chown). chown
   silently fails inside user-namespaced or CAP_CHOWN-stripped Docker
   containers, which re-triggers the original bug for root-in-Docker
   users — exactly the scenario this fix was reported against. Routing
   root through the same in-place fallback as non-root eliminates this
   failure mode and drops an untestable branch (chown-back can't be
   exercised under non-root CI).

2. Document the three properties traded away by the in-place fallback:
   crash atomicity, concurrent-reader isolation, inotify watcher
   semantics (MODIFY vs MOVED_TO).

3. Document that the in-place fallback surfaces EACCES when the file's
   mode forbids the current user from writing — this is correct
   behavior (atomic rename used to silently replace files the user had
   no permission on, which was arguably a privilege issue).

4. Replace the brittle "see step 6 in the function doc" comment with a
   step-number-independent reference.

5. New test covering the EACCES path: chmod 0o444 + mocked geteuid
   triggers the fallback, fallback hits the read-only file, EACCES
   propagates cleanly, original content is preserved.
}

const statAfter = await fs.stat(filePath);
expect(statAfter.ino).toBe(inoBefore);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The uid-mismatch test asserts three invariants (inode preserved, mode preserved, no leftover temp file) but this gid-mismatch test only checks two (inode + content). Add the same mode and temp-file assertions for consistency — if the gid-fallback path ever fails to preserve permissions, this test would not catch it:

Suggested change
expect(statAfter.ino).toBe(inoBefore);
expect(statAfter.ino).toBe(inoBefore);
expect(await fs.readFile(filePath, 'utf-8')).toBe('updated');
// Permissions preserved.
expect(statAfter.mode & 0o777).toBe(realStat.mode & 0o777);
// No leftover temp file.
expect(await fs.readdir(tmpDir)).toEqual(['shared-group.txt']);

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the ownership-preservation fix. The core design — fall back to in-place write when uid/gid mismatches, with rename in the matching case — is sound and the JSDoc trade-off enumeration is well written. Two suggestions inline. Additional terminal-only notes (not blocking): (a) atomicWriteJSON's own JSDoc at line ~235 still claims unconditional atomic write-to-temp + rename — slightly stale now; (b) runtimeStatus.ts:34 and worktreeSessionService.ts:106-109 both advertise crash-atomic writes via tmp+rename, which is no longer strictly true under ownership mismatch; consider updating those caller comments in a follow-up. Tests look good (66/66 passing locally including sibling fileSystemService suite).

— claude-opus-4-7 via Claude Code /qreview

};

if (ownershipWouldChange()) {
await fs.writeFile(targetPath, data, writeOptions);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Three edge cases the path-based fs.stat()fs.writeFile(targetPath, ...) pair introduces that the prior rename(tmp, target) path didn't have:

  1. Non-regular files: if targetPath resolves to a FIFO, fs.writeFile calls open(path, O_WRONLY|O_CREAT|O_TRUNC) which blocks forever waiting for a reader. The rename path replaced the FIFO with a regular file. Sockets / character devices behave similarly oddly. A user file is rarely a FIFO, but existingStat.isFile() would cheaply rule this class of footgun out.
  2. Symlink-swap TOCTOU: between fs.stat(targetPath) (line ~152) and this line, an attacker who can write the parent directory can swap targetPath for a symlink. fs.writeFile follows symlinks at the destination; POSIX rename does not. In the very "shared-write workspace / Docker bind-mount" scenarios this PR targets, this lets a directory-writable attacker redirect agent writes elsewhere.
  3. Unlink race: if targetPath is unlinked between stat and write, O_CREAT recreates it owned by the calling user — the exact ownership change the fallback was designed to prevent. Silent regression to the pre-fix bug under this race.

The defensive shape that closes all three at once: open once with fs.open(targetPath, fs.constants.O_WRONLY | fs.constants.O_TRUNC | fs.constants.O_NOFOLLOW) (no O_CREAT), fstat the fd to confirm uid/gid still match existingStat, then write/chmod through the fd. Inode is bound; symlinks rejected; missing files surface ENOENT instead of silent recreate.

— claude-opus-4-7 via Claude Code /qreview

* - **Watcher semantics** — emits an inotify `MODIFY` event rather
* than `MOVED_TO` / `CREATE`. Most watchers (chokidar, VSCode)
* handle both, but consumers that only watch for one will miss
* these writes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The trade-off list omits hardlink propagation, which is a directly observable semantic shift between the two paths: rename produces a fresh inode so sibling hardlink names retain the previous content; in-place truncate+write keeps the inode so every hardlink to it sees the new content. Worth a fourth bullet so backup/snapshot/dedup workflows that watch hardlink siblings aren't surprised.

Suggested change
* these writes.
* these writes.
* - **Hardlink propagation** writes are visible through every
* hardlink to the same inode, rather than detaching this name's
* content from siblings as rename would.

— claude-opus-4-7 via Claude Code /qreview

…s + doc/test follow-ups

Review follow-ups on QwenLM#4431 ownership-preservation fix:

CRITICAL — in-place fallback security hardening (wenshao review):

The path-based `fs.writeFile(targetPath, ...)` fallback introduced
three races that the prior `rename(tmp, target)` form did not have:

1. Non-regular files (FIFO/socket/device): fs.writeFile calls
   open(O_WRONLY|O_CREAT|O_TRUNC). On a FIFO this blocks forever
   waiting for a reader. On a character/block device it writes to
   the actual device. The rename path replaced these with a
   regular file.

2. Symlink-swap TOCTOU: an attacker with parent-dir write can swap
   targetPath for a symlink between our stat and our writeFile.
   fs.writeFile follows symlinks at the destination; POSIX rename
   does not. In the very "shared-write workspace / Docker bind-mount"
   scenarios this PR targets, this lets a directory-writable
   attacker redirect agent writes elsewhere (e.g. /etc/passwd if
   the agent runs as root).

3. Unlink race: if targetPath is unlinked between stat and write,
   O_CREAT silently recreates it owned by the calling user — the
   exact ownership change the fallback was designed to prevent.
   Silent regression to the pre-fix bug under this race.

Fix: extract the fallback into writeInPlaceWithFdGuards():

  - open(target, O_WRONLY | O_TRUNC | O_NOFOLLOW) — no O_CREAT, so
    unlink-race surfaces ENOENT instead of silently recreating; and
    O_NOFOLLOW rejects symlink-swaps with ELOOP.
  - fstat(fd) verifies the bound inode's uid/gid still match
    existingStat — refuses the write if an inode-swap happened
    between stat and open.
  - Write through the fd (locked to the verified inode), chmod
    through the fd, close.

Caller now gates the fallback on existingStat.isFile() — non-regular
targets fall through to the atomic path which has well-defined
"replace special-file with regular-file" semantics.

DOC / TEST follow-ups:

- Add hardlink-propagation as a 4th trade-off in the in-place
  fallback JSDoc (review comment QwenLM#4): rename creates a new inode so
  sibling hardlinks keep old content; in-place truncate+write keeps
  the inode so all hardlinks see new content.

- Update atomicWriteJSON JSDoc to note the write is now
  *conditionally* atomic (review comment QwenLM#5): atomic when uid/gid
  matches the process, in-place when ownership differs. Previously
  the JSDoc still claimed unconditional atomicity.

- Update caller comments at runtimeStatus.ts and
  worktreeSessionService.ts that advertised crash-atomic writes via
  tmp+rename — those guarantees are now conditional (review
  comment QwenLM#6).

- Add mode + tmp-leftover assertions to the gid-mismatch test to
  match the uid-mismatch test (review comment QwenLM#2 — test
  consistency). Without these, a gid-fallback regression that
  silently dropped permissions or left a tmp file would not be
  caught.

- New test: FIFO + ownership mismatch must take the atomic path,
  not in-place (verifies the existingStat.isFile() guard works;
  hang on in-place would trip vitest timeout).

- New test: writing through a symlink with ownership mismatch
  exercises the resolve-then-stat-then-open flow and verifies the
  symlink itself is preserved.

Tests: 192/192 pass (atomicFileWrite + write-file + edit +
fileSystemService).
Comment thread packages/core/src/utils/atomicFileWrite.ts Outdated
@doudouOUC doudouOUC requested review from LaZzyMan and tanzhenxin May 22, 2026 16:50
PR QwenLM#4431 review follow-up (wenshao critical):

The previous form opened with `O_WRONLY | O_TRUNC | O_NOFOLLOW`, which
truncated the bound file *before* the fd-bound fstat verification ran.
If an attacker swapped the path between the caller's stat and our
open, we would truncate the attacker's substituted inode (destroying
unrelated content) before detecting the swap.

Two fixes:

1. Open without O_TRUNC. Verify dev+ino+uid+gid+isFile match
   expectedStat through fh.stat(). Only then call fh.truncate(0)
   through the validated fd.

2. Expand the verification beyond uid+gid to include dev+ino+isFile.
   uid+gid alone misses a same-owner inode swap (attacker replaces
   the path with a different inode they own). dev+ino is the strong
   identity check; isFile catches a swap to FIFO/socket/device after
   the caller's existingStat.isFile() gate.

JSDoc updated to enumerate the four guards (NOFOLLOW, no CREAT, no
TRUNC at open, dev+ino+uid+gid+isFile via fstat) and explain why
truncation must wait until after verification.

192/192 tests pass.
Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical (test gap): The EOWNERSHIP_CHANGED error path in writeInPlaceWithFdGuards (fstat guard detecting inode swap between stat and open) is completely untested. This is the primary TOCTOU defense for the in-place write path — no test triggers the dev/ino/uid/gid/isFile mismatch detection. writeInPlaceWithFdGuards is module-private, but can be tested by mocking fs.open to return a handle whose .stat() reports different attributes.

Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts
…ANGED path

PR QwenLM#4431 review follow-up (deepseek-v4-pro via /review):

CRITICAL — FIFO swap TOCTOU:

The caller's `existingStat.isFile()` gate uses stat data captured
earlier. An attacker with parent-dir write can swap the regular file
for a FIFO between the caller's stat and our open inside
`writeInPlaceWithFdGuards`. The previous `O_WRONLY | O_NOFOLLOW` open
would then block indefinitely waiting for a FIFO reader; O_NOFOLLOW
only catches symlinks.

Fix: add O_NONBLOCK to the open flags. Defense in depth:

- On a reader-less FIFO, `open(O_WRONLY | O_NONBLOCK)` returns ENXIO
  immediately — no hang.
- If the FIFO has a reader (open succeeds), the subsequent fstat
  isFile() check still refuses the write via EOWNERSHIP_CHANGED.
- For regular files, O_NONBLOCK is a no-op.

CRITICAL test gap — EOWNERSHIP_CHANGED branch untested:

The primary TOCTOU defense (fdStat dev/ino/uid/gid/isFile vs
expectedStat) had no coverage. Exported `writeInPlaceWithFdGuards` so
it can be unit-tested directly:

- New test: simulate post-stat inode swap (unlink + recreate at same
  path), call helper with stale stat, assert EOWNERSHIP_CHANGED and
  that the attacker's content survives.
- New test: simulate post-stat regular→FIFO swap, assert open fails
  fast (ENXIO) or fstat catches it — either way no hang, no write.

DOC fix:

JSDoc said "we open read-write without truncating" but the code uses
O_WRONLY. Wording corrected to "write-only".

194/194 tests pass.
@doudouOUC
Copy link
Copy Markdown
Collaborator Author

Re: review #4348277051 (critical test gap for EOWNERSHIP_CHANGED path): addressed in 0a095cf.

Exported writeInPlaceWithFdGuards so it can be unit-tested directly, and added two regression tests:

  1. Post-stat inode swap (unlink + recreate at same path): call helper with stale stat → asserts EOWNERSHIP_CHANGED is thrown and the attacker's content survives untouched.
  2. Post-stat regular→FIFO swap: asserts either ENXIO (open fails fast thanks to O_NONBLOCK) or EOWNERSHIP_CHANGED (fstat catches FIFO) — both prove the FIFO race window is closed without hang.

Copy link
Copy Markdown
Collaborator

@wenshao wenshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Downgraded from Approve to Comment: CI failing (Test jobs on macos/windows/ubuntu).

No Critical findings. The ownership-preservation design is sound — the TOCTOU hardening chain (O_NOFOLLOW / no O_CREAT / O_NONBLOCK / post-open fstat dev+ino+uid+gid verification) is thorough. 9 Suggestion-level findings and 4 Nice-to-have, primarily test coverage gaps and documentation accuracy.

Comment thread packages/core/src/utils/atomicFileWrite.ts Outdated
Comment thread packages/core/src/utils/atomicFileWrite.ts Outdated
Comment thread packages/core/src/utils/atomicFileWrite.ts Outdated
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.test.ts Outdated
Comment thread packages/core/src/utils/atomicFileWrite.test.ts
Comment thread packages/core/src/utils/atomicFileWrite.test.ts
PR QwenLM#4431 review follow-up (glm-5.1 via /review) — 7 suggestions adopted,
1 partially adopted, 0 rejected:

CI FIX (Ubuntu test failure on tmpfs inode reuse):

The EOWNERSHIP_CHANGED inode-swap test used unlink+create to simulate
a post-stat swap. On Linux tmpfs the freshly-freed inode number is
often reused by the immediately-following create, so dev+ino remained
identical and the guard didn't trip (intermittent on Ubuntu CI; macOS
APFS happened to allocate different inodes). Switched to rename(decoy,
target) which moves an existing distinct inode into place, guaranteed
to differ from the original.

CODE:

- Wrap fh.writeFile failure after fh.truncate(0) with
  EINPLACE_WRITE_FAILED + cause, so callers see explicitly that the
  file was truncated and the write didn't complete (otherwise they
  see raw ENOSPC/EIO and may wrongly assume the original is intact
  given this lives in atomicFileWrite.ts).
- Skip fh.chmod when euid is neither root nor expectedStat.uid —
  chmod is guaranteed to fail with EPERM in that case (POSIX requires
  owner or root). Avoids a guaranteed-failing syscall on every call.
- Caller catches ENOENT from writeInPlaceWithFdGuards and falls
  through to atomic rename path. If the file was deleted between
  caller's stat and our open there is no ownership to preserve; the
  rename path correctly creates a new file at targetPath.

DOC:

- Replaced "defends against four races" with "hardened against
  post-stat races" (the bullet list has 5 items, the count was wrong).
- Reworded "non-regular targets must not reach this function" to
  describe defense-in-depth — O_NONBLOCK + !fdStat.isFile() reject
  post-stat regular→FIFO/socket/device swaps. The old wording made
  it look like O_NONBLOCK was redundant.
- Documented the dual chmod behavior (root vs non-root with foreign
  uid) inline.

TESTS:

- Added happy-path test for writeInPlaceWithFdGuards (write succeeds,
  inode preserved, mode preserved).
- Added ENOENT regression test (verifies the missing-O_CREAT
  property — if file unlinked between stat and open, no silent
  recreate with caller's uid).
- Renamed the misleading "O_NOFOLLOW guard" test (it actually tests
  resolve-through-symlink, not O_NOFOLLOW) to reflect what it does,
  and added a direct ELOOP test that drives writeInPlaceWithFdGuards
  with a path whose final component is a symlink — that's the real
  O_NOFOLLOW exercise.
- Fixed the FIFO test to pass a stat captured from the FIFO itself
  (not a stale regular-file stat) so only the FIFO-specific defense
  fires, not the inode/dev mismatch from a different file.

NOT ADOPTED:

- Skip-when-non-root chmod optimization adopted (small, useful), but
  the larger "structured chmod error model" deferred — best-effort
  matches the existing tryChmod pattern at file scope.

197/197 tests pass.
@doudouOUC
Copy link
Copy Markdown
Collaborator Author

Re: review #4348845741 (CI failing + 9 suggestions): addressed in 8c1a651.

Root cause of the Ubuntu CI failure — the EOWNERSHIP_CHANGED inode-swap test used unlink(target) + writeFile(target) to simulate a post-stat swap. On Linux tmpfs the freshly-freed inode number is often reused by the immediately-following create, so dev+ino remained identical and the guard didn't trip (intermittent on Ubuntu; macOS APFS happened to allocate distinct inodes). Switched to rename(decoy, target) which moves an existing distinct inode into place, guaranteed to differ from the original.

Suggestions — adopted all 8 inline (see inline replies). Replies + code at 8c1a651.

197/197 tests pass locally.

Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts Outdated
Comment thread packages/core/src/utils/atomicFileWrite.ts Outdated
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts Outdated
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.ts
… chmod sync

PR QwenLM#4431 review follow-up (qwen-latest-series-invite-beta-v34 via /review)
— 7 of 10 suggestions adopted, 3 deferred:

CODE:

- **EINPLACE_TRUNCATE_FAILED wrap** (review #3291863048): symmetric to
  the existing EINPLACE_WRITE_FAILED — distinguishes "truncate failed,
  original intact" from "write failed post-truncate, original lost".

- **Post-write nlink === 0 check** (review #3291863059):
  EINODE_UNLINKED_DURING_WRITE detects the fstat-to-close window where
  a concurrent rename-over drops our bound inode's link count to zero
  and our write goes to an anonymous inode close will free. Silent
  data loss path now surfaces.

- **fh.close() guarded in finally** (review #3291863044): close failure
  on NFS/FUSE was masking the original try-body exception (including
  the meaningful EOWNERSHIP_CHANGED, EINPLACE_*, EINODE_*). flush:true
  already fsync'd, so close-after-flush is best-effort.

- **fdStat.uid in canChmod** (review #3291863055 part 1): use the
  fd-bound verified value instead of expectedStat.uid. Defense in depth
  — a future weakening of the fstat guard won't silently widen chmod
  privilege.

- **fh.sync() after chmod** (review #3291863053): chmod is metadata,
  not covered by writeFile({ flush: true }). A crash before lazy
  metadata flush would lose the mode restoration (matters for
  setuid/setgid). One extra syscall, best-effort.

- **@remarks freshness contract** (review #3291863051 partial): JSDoc
  now spells out that expectedStat MUST be a fresh stat captured
  immediately before the call. Stale stats nullify every guard.

- **Concurrent-writer limitation noted** (review #3291863061 partial):
  added a "Known limitation — no advisory locking" paragraph to JSDoc
  rather than adopting flock (Linux-specific, NFS issues, scope
  expansion). Callers needing multi-process coordination should layer
  their own lockfile.

- **@throws documentation** (review #3291863051 partial): four
  documented error codes (EOWNERSHIP_CHANGED, EINODE_UNLINKED_DURING_WRITE,
  EINPLACE_TRUNCATE_FAILED, EINPLACE_WRITE_FAILED).

TESTS:

- **EINPLACE_WRITE_FAILED via FileHandle.prototype.writeFile monkey-patch**
  (review #3291863040): triggers the data-loss path, asserts the wrapped
  code + message + cause, and verifies the file is empty (truncate ran).

- **canChmod=false actually skips chmod** (review #3291863055 part 2):
  prior uid-mismatch test had desiredMode === current mode, couldn't
  distinguish "skipped" from "no-op". New test uses desiredMode=0o755
  on a 0o644 file under canChmod=false → asserts mode stays 0o644.

NOT ADOPTED:

- ENOENT/ELOOP/ENXIO catch extension (review #3291863043): keeping the
  strict refusal for swap-to-special-file. Silent fallthrough-to-replace
  was pre-PR atomic-rename behavior, but in shared-write workspaces
  (this PR's target users) a special-file appearing at the target path
  is a signal worth surfacing, not papering over.

- Diagnostic logging (review #3291863049): the function has no logger
  dependency today; adding one is an architecture decision outside
  this PR's scope. The path taken is implied by the side effects
  (inode preserved vs new) but agreed: out-of-band telemetry would
  help ops. Defer to follow-up.

- flock advisory locking (review #3291863061 main): scope expansion;
  Linux-specific semantics, NFS edge cases. Documented as known
  limitation instead.

- Integration test for ENOENT fallthrough at atomicWriteFile level
  (review #3291863043 part 1): ESM module bindings prevent monkey-
  patching writeInPlaceWithFdGuards from outside. The unit test for
  the helper's ENOENT path covers the throwing behavior; the catch is
  3 lines and review-visible. Defer until a refactor opens an
  injection seam.

- Error code string constants export (review #3291863051 part 3): two
  codes don't merit a constant module. Magic strings are fine at this
  size.

199/199 tests pass.
@doudouOUC
Copy link
Copy Markdown
Collaborator Author

Re: review #4349538918 (10 inline, 2 Critical + 8 Suggestion): addressed in 4011ccc.

Adopted (7): EINPLACE_WRITE_FAILED test, EINPLACE_TRUNCATE_FAILED wrap, fh.close() guarded in finally, fdStat.uid in canChmod + skip-actually-effective test, fh.sync() after chmod, post-write nlink === 0 detection (EINODE_UNLINKED_DURING_WRITE), @remarks freshness contract + @throws docs.

Deferred / not adopted (3): ENOENT integration test (ESM mock infeasible — unit test covers helper, catch is 3-line review-visible), diagnostic logging (architecture decision, separate PR), flock (Linux/NFS edge cases — documented as known limitation with caller guidance instead), ELOOP/ENXIO catch extension (strict refusal is more defensible in shared-write scenarios than silent special-file replacement).

199/199 tests pass. See inline replies for each comment.

Comment thread packages/core/src/utils/runtimeStatus.ts
…tract

PR QwenLM#4431 review follow-up: function-level JSDoc still claimed
unconditional "Atomically write" and "never sees a partially written
file", inconsistent with the module-level docblock updated in earlier
commits. Updated to describe the conditional-atomic behavior (atomic
when uid/gid matches, in-place fallback when ownership differs) and
explicitly note the concurrent-reader visibility trade-off in the
fallback path. Links to atomicWriteJSON for the full contract.

Doc-only change. 199/199 tests pass.
Comment thread packages/core/src/utils/atomicFileWrite.ts
Comment thread packages/core/src/utils/atomicFileWrite.test.ts
…h option

PR QwenLM#4431 review follow-up (qwen3.7-max via /review):

CRITICAL — FileHandle.writeFile silently ignores flush:

Node.js FileHandle.writeFile takes an early-return path that bypasses
the flush option entirely (the option is only honored on the
path-based fs.writeFile form). Our previous code passed
{ flush: true } to fh.writeFile and relied on the implicit fsync.
The only explicit fh.sync() was nested in the chmod block guarded by
canChmod — which is FALSE precisely when a non-root group member
writes to a group-writable file they don't own (the exact shared-write
scenario this PR targets). Net effect: in that branch, zero fsync.
Data sits in the kernel page cache; a crash before lazy flush leaves
the file empty (truncate succeeded) or partially written.

Fix:
- Drop flush from the fhWriteOptions object (silently ignored anyway).
- Add an explicit `fh.sync()` after writeFile succeeds, gated on
  options.flush. Runs BEFORE the chmod block so the canChmod=false
  branch also fsyncs.
- The chmod-block fh.sync() becomes metadata-only (covers the mode
  change), as the data is already on disk.

Updated comments to reflect the actual semantics rather than the
incorrect "writeFile({ flush: true }) fsyncs" assumption.

TESTS (partial adoption of review #3293252349):

- EINPLACE_TRUNCATE_FAILED: sibling test to EINPLACE_WRITE_FAILED.
  Monkey-patches FileHandle.prototype.truncate to throw EIO; asserts
  err.code + cause + "original content is intact" message, and
  verifies the file's original bytes are unchanged (truncate didn't
  run).
- Buffer in in-place fallback: locks in binary fidelity (byte-exact
  comparison) so a future encoding-passthrough regression for Buffer
  data would be caught.

NOT ADOPTED in this commit:

- EINODE_UNLINKED_DURING_WRITE test: requires post-write fh.stat()
  mocking with call-count discrimination (first call: real stat for
  verification; second call: nlink=0). The monkey-patch pattern works
  but is fragile; deferred to a follow-up that may also refactor the
  helper to accept an injectable stat fn for cleaner testability.

201/201 tests pass.
}
} finally {
// close() can throw on NFS (COMMIT failure) or FUSE. Don't let it
// mask the original try-body exception — flush:true already fsync'd
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This comment says "flush:true already fsync'd the data" — but this commit's core fix is that flush:true on FileHandle.writeFile is silently ignored by Node.js. The data is actually fsync'd by the explicit fh.sync() at line 238.

A future maintainer reading this comment might conclude the explicit fh.sync() above is redundant and remove it, silently re-introducing the durability hole this commit fixes.

Suggested change
// mask the original try-body exception — flush:true already fsync'd
// close() can throw on NFS (COMMIT failure) or FUSE. Don't let it
// mask the original try-body exception — the explicit fh.sync()
// above already committed the data, so close failure here is
// best-effort.

— qwen3.7-max via Qwen Code /review

// branch is the exact shared-write scenario this PR targets.
if (options.flush) {
try {
await fh.sync();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This explicit fh.sync() — the core fix of this commit — has no regression test. The existing monkey-patching pattern (FileHandleProto.truncate, FileHandleProto.writeFile) could trivially spy on FileHandleProto.sync to assert it's called when flush: true and skipped when flush is absent.

Without this, a future refactor removing the sync call would pass all 34 tests silently — which is exactly how the original flush-ignored bug went undetected.

— qwen3.7-max via Qwen Code /review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants