fix(review,cso): SHA-pin PR diff to prevent worktree-flip-during-review#1317
Open
habassa5 wants to merge 5 commits intogarrytan:mainfrom
Open
fix(review,cso): SHA-pin PR diff to prevent worktree-flip-during-review#1317habassa5 wants to merge 5 commits intogarrytan:mainfrom
habassa5 wants to merge 5 commits intogarrytan:mainfrom
Conversation
…mid-review
Per gstack's bisect-commits style, this is the source-of-truth commit
(template + resolver + test). The regenerated SKILL.md files land in
the next commit. The CHANGELOG and VERSION bump land in the third.
Bug: a long-running review skill that reads git state through symbolic
refs (`HEAD`, `origin/<base>`, `origin/HEAD`) silently re-renders against
whatever the working tree is checked out to at the moment each `git diff`
runs — not against the branch the user asked to review. Inside Agent SDK
sessions where nested subagents share the worktree, a stray `git checkout`
anywhere in the call tree (e.g., to inspect a sibling branch's file) flips
HEAD and never restores it. Every later diff command then targets the
wrong branch, and the review reports findings on unrelated code.
Empirical evidence (claude-teams-bot, 2026-05-04): three consecutive
PR reviewers in one session observed the upstream Claude Code built-in
`/security-review` rendering against the wrong branch. The same class
of bug exists in gstack `/review` (uses `git diff origin/<base>`) and
`/cso` `--diff` mode (uses `git log <base>..HEAD`).
Fix: a new `{{PR_DIFF_PIN}}` template fragment (Step 0.5 of the skill)
resolves `BASE_SHA` and `HEAD_SHA` to immutable commit SHAs at the start
of the skill, via `gh pr view --json headRefOid` in PR context or
`git rev-parse origin/<base>` plus local `HEAD` otherwise. Every later
`git diff`, `git log`, and `git show` invocation is rewritten to use
those SHAs by value. SHAs are immutable across worktree flips.
Out of scope (left for follow-up PRs once per-skill verification is
done): `ship`, `codex`, and `document-release` have similar bare-ref
patterns but are reachable from outside multi-agent reviews where the
flip risk is lower. Adopting `{{PR_DIFF_PIN}}` there is one line each.
Not addressed: the upstream Claude Code `/security-review` built-in has
the same bug (`git diff --name-only origin/HEAD...`) but lives in
`cli.js` and is out of gstack's reach. After this change, gstack
`/cso` is a strictly safer alternative for security audits run from
inside multi-agent sessions.
Test: `test/pr-diff-pin-regression.test.ts` builds a real two-branch
fixture, simulates the worktree flip, and asserts (a) bare `git diff
main` reports the wrong branch (proving the failure mode is real),
(b) `git diff "$BASE_SHA" "$HEAD_SHA"` reports the right branch
(proving the fix works), (c) the same holds for `git log` and
`git show`, and (d) the templates use `{{PR_DIFF_PIN}}` and SHAs in
imperative bash blocks. 8 tests, ~6s, free tier.
Refs: claude-teams-bot CLAUDE.md §58 (stale-git-context bug),
§62-§68 (build-discipline gates), failure-mode name
`shared-checkout-branch-flip-during-review`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical regeneration via `bun run gen:skill-docs`. No manual edits. Mirrors the previous commit's template changes into the rendered files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps to v1.26.3.0 (PATCH on top of v1.26.2.0). Single-skill bug fix plus regression test plus reusable resolver. Net diff ~660 lines, no new user-facing capability beyond the safety guarantee, fits PATCH guidance per CLAUDE.md commit-style rules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Independent codex review (HIGH effort, adversarial mindset per CLAUDE.md garrytan#67) surfaced 7 substantive concerns against the initial fix. This commit addresses all of them. Findings: P1 — Resolver-emitted commands still used bare refs. /review imports several resolver-generated sections (SCOPE_DRIFT, PLAN_COMPLETION_AUDIT_REVIEW, ADVERSARIAL_STEP, REVIEW_ARMY) whose templates emitted `git diff origin/<base>` and `git log origin/<base>..HEAD` unchanged. Even with PR_DIFF_PIN at the top of the skill, those later sections would re-render against the working tree if a subagent flipped it. Fix: thread `ctx.skillName === 'review'` through the affected resolvers (`generateScopeDrift`, `generatePlanCompletionAuditInner`, `generateAdversarialStep`, `generateSpecialistSelection`, `generateSpecialistDispatch`, `generateRedTeam`) and emit pinned-SHA forms inside `/review` while leaving `/ship` on the bare-ref form for back-compat (ship doesn't yet adopt PR_DIFF_PIN). P2 — Uncommitted local edits dropped in the local-/review path. Pre-fix `/review` ran `git diff origin/<base>` against the working tree, which included uncommitted edits — a real workflow for users who run /review before /ship creates the PR. Fix: introduce `$REVIEW_DIRTY` env var (default 0 in PR context, 1 for local pre-PR). When 1, Step 3 instructs the agent to combine `git diff "$BASE_SHA"` (committed, pinned) with `git diff "$HEAD_SHA"` (uncommitted, by necessity working-tree-dependent — the user's caller knows when they're in dirty mode and reads it once at the start). P2 — Use baseRefOid from gh pr view directly (immutable) instead of re-resolving via fetch + rev-parse origin/<base>. `gh pr view` returns both `baseRefOid` and `headRefOid` — pinning to those is strictly more correct than `git rev-parse origin/<base>`, which can be stale if the local clone hasn't fetched recently. Fix: read baseRefOid from the same `gh pr view` JSON we already fetch. Fall back to `git rev-parse origin/<base>` only outside PR context. P2 — Stale base ref when fetch fails. In non-PR fallback, `git fetch origin "$BASE_BRANCH"` is wrapped in `|| true` and silently pins whatever local tracking ref exists. Fix: warn explicitly when the fetch fails, surface what was pinned. Also add a `git cat-file -e` probe for both pinned SHAs and warn when either is not present in the local object store (catches missing fetch-of-PR-head + missing-pull-ref scenarios). P2 — Don't bless `gh pr diff "$PR_NUMBER"` as immutable. Initial fix recommended `gh pr diff` as an alternative; codex pointed out that it re-resolves HEAD and BASE server-side at every call, so a force-pushed PR head or fast-forwarded base mid-review changes its output. Fix: replace the recommendation with an explicit warning. If the user genuinely wants the GitHub-rendered diff, point them at `gh api /repos/.../compare/$BASE_SHA...$HEAD_SHA` which IS SHA-pinned. P2 — Smell test only checks fenced bash blocks, missing inline imperatives. Step 3 of the original /review used inline backtick-wrapped commands ("Run `git diff origin/<base>` to ..."). The smell test in pr-diff-pin-regression.test.ts only inspected fenced bash blocks, so a future regression that puts a bare ref inside backticks in narrative prose would slip through. Fix: extend imperativeBashCommands() to also pull inline backtick-wrapped `git`/`gh`/`bun` commands. Skip don't/do comparison rows and explicit "**Don't**" labels via heuristic, with comments calling out each filtering rule. Verified empirically: a deliberately-injected `Run \`git diff origin/<base>\`` inline string now fails the test loudly. P2 — Don't require PR pinning for non-diff /cso modes. Original fix exited(1) when SHAs couldn't be resolved, which broke /cso --infra, /cso --supply-chain, /cso --owasp (these don't need a diff). Fix: soft-fail with WARN. /review re-asserts the requirement with an explicit `[ -n "$BASE_SHA" ] && [ -n "$HEAD_SHA" ]` check at Step 1; /cso scope-flag modes degrade gracefully. P3 — Pin slop scan head as well. Acknowledged but out of scope. `scripts/slop-diff.ts` internally uses `${base}...HEAD`, computes merge-base from working-tree HEAD, and walks the worktree directly. Properly fixing it requires changing the slop-diff.ts internals to accept a head-SHA argument. Tracked as follow-up; current change passes `$BASE_SHA` to slop-diff which still drifts on its head side but at least pins the base. Tests still pass (8/8 in pr-diff-pin-regression). The deliberate-bad- inline-pattern smoke test confirms the smell test catches the new class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical regeneration via `bun run gen:skill-docs`. No manual edits. Mirrors the previous commit's resolver+template changes into the rendered files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/reviewand/csouse working-tree-dependent git refs (origin/<base>,<base>..HEAD) when computing the PR diff. Inside Agent SDK sessions where nested subagents share the worktree, a straygit checkoutanywhere in the call tree (e.g., to inspect a sibling branch's file) silently re-targets every latergit diffcommand, and the review reports findings on the wrong branch. Observed empirically three times in one session in a downstream project's PR reviewers.This PR introduces a
{{PR_DIFF_PIN}}template fragment (Step 0.5 of the affected skills) that resolvesBASE_SHAandHEAD_SHAto immutable commit SHAs at the start of the skill — viagh pr view --json baseRefOid,headRefOidin PR context, orgit rev-parse origin/<base>plus localHEADotherwise — and rewrites every subsequentgit diff/git log/git showto use those SHAs by value. Commit SHAs are immutable across worktree flips.The bug, by example
A
/reviewsession is in progress on branchfeature-B. A nested subagent runsgit checkout feature-Ato inspect a sibling branch's implementation, then doesn't switch back. From that point on, everygit diff origin/mainin the review skill renders againstfeature-A, and the review reports vulnerabilities and dead code fromfeature-Awhile purporting to reviewfeature-B.Empirical evidence (from the downstream project that surfaced this)
Three back-to-back PR reviewers in the same session observed wrong-branch diffs:
/security-reviewwhile on PR feat: contributor mode, session awareness, universal RECOMMENDATION format (v0.3.11) #82's branch but received the diff for an unrelated branch the worktree had been flipped to mid-session. Reviewer correctly detected and refused the false-positive findings./security-reviewskill not installed locally as a callable executable; performed manual security review per the rubric." Different surface symptom — same underlying class of bug./security-review... NOT RUN. Skipped per known stale-git-context bug guidance." Bug is now actively avoided rather than tolerated.What's NOT changed
/security-review(Claude Code built-in) has the exact same class of bug —cli.jsships the promptgit diff --name-only origin/HEAD...literally — but lives in Anthropic's binary and is out of gstack's reach. After this PR,/csois a strictly safer alternative for security audits run from inside multi-agent sessions. The Claude Code bug is being filed separately upstream with Anthropic./ship,/codex,/document-releasehave similar bare-ref patterns. They are reachable from outside multi-agent reviews where the flip risk is lower; this PR leaves them on bare refs for back-compat. Adopting{{PR_DIFF_PIN}}there is one-line each in their.tmpl. Tracked as follow-up.scripts/slop-diff.tsinternally uses${base}...HEADand walks the working tree. Properly fixing it requires changing the script's interface to accept a head-SHA argument. Tracked as follow-up.Test plan
bun test test/pr-diff-pin-regression.test.ts— 8 tests, ~6s, free tierreview/SKILL.md.tmplorcso/SKILL.md.tmpl— both fenced bash blocks AND inline backtick-quoted commandsRun \git diff origin/`` into review/SKILL.md.tmpl makes the smell test fail loudlybun test test/skill-collision-sentinel.test.ts— confirms no skill-name collisions introducedbun run gen:skill-docs --dry-run— confirms generated SKILL.md files are in sync with templates9404fa9a. Re-running the same adversarial framing on the fixed branch is recommended before merge.gh repo clone <fork> -- --branch fix/...to a clean directory,bun install, run regression test, all pass.Adversarial codex findings addressed
ctx.skillName === 'review'through 6 affected resolvers; pinned-SHA forms inside/review, bare refs preserved for/ship$REVIEW_DIRTYenv var (default 0 in PR context, 1 for local pre-PR), Step 3 instructs combininggit diff "$BASE_SHA"withgit diff "$HEAD_SHA"when setgh pr view --json baseRefOiddirectly (immutable) inside PR contextgit cat-file -eprobe for both pinned SHAsgh pr diffblessed as immutable but isn'tgh api /repos/.../compare/$BASE_SHA...$HEAD_SHAif user wants GitHub-rendered diffgit/gh/bunimperatives/csoexit(1) on missing SHAs broke non-diff scope flags/reviewre-asserts requirement with explicit checkFiles changed
Failure mode tracking
This change closes the failure mode named
shared-checkout-branch-flip-during-reviewin CLAUDE.md tracking. The same name appears inStep 0.5documentation and the regression test's docblock so future maintainers can grep for it.Compatibility / breaking changes
None. The bare-ref forms are preserved for
/shipand other skills that don't yet adopt{{PR_DIFF_PIN}}. Users running/reviewand/csowill see a new "Pinned review context:" log line at the start of the skill (Step 0.5), but the rest of the workflow is functionally identical except in the buggy worktree-flip case.🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.