fix(review,cso): SHA-pin PR diff to prevent worktree-flip-during-review by habassa5 · Pull Request #1317 · garrytan/gstack

habassa5 · 2026-05-04T09:06:21Z

Summary

/review and /cso use working-tree-dependent git refs (origin/<base>, <base>..HEAD) when computing the PR diff. Inside Agent SDK sessions where nested subagents share the worktree, a stray git checkout anywhere in the call tree (e.g., to inspect a sibling branch's file) silently re-targets every later git diff command, and the review reports findings on the wrong branch. Observed empirically three times in one session in a downstream project's PR reviewers.

This PR introduces a {{PR_DIFF_PIN}} template fragment (Step 0.5 of the affected skills) that resolves BASE_SHA and HEAD_SHA to immutable commit SHAs at the start of the skill — via gh pr view --json baseRefOid,headRefOid in PR context, or git rev-parse origin/<base> plus local HEAD otherwise — and rewrites every subsequent git diff/git log/git show to use those SHAs by value. Commit SHAs are immutable across worktree flips.

The bug, by example

A /review session is in progress on branch feature-B. A nested subagent runs git checkout feature-A to inspect a sibling branch's implementation, then doesn't switch back. From that point on, every git diff origin/main in the review skill renders against feature-A, and the review reports vulnerabilities and dead code from feature-A while purporting to review feature-B.

# Before the fix (working-tree dependent — bug)
git diff origin/main           # whatever branch HEAD currently points to
git log origin/main..HEAD      # whatever branch HEAD currently points to
git diff origin/main...HEAD    # whatever branch HEAD currently points to
git diff --name-only origin/HEAD...   # same — Claude Code's built-in /security-review uses this exact pattern

# After the fix (SHA-pinned — correct)
git diff "$BASE_SHA" "$HEAD_SHA"
git log "$BASE_SHA..$HEAD_SHA"
git show "$HEAD_SHA:VERSION"

Empirical evidence (from the downstream project that surfaced this)

Three back-to-back PR reviewers in the same session observed wrong-branch diffs:

Hit 1 (PR feat: contributor mode, session awareness, universal RECOMMENDATION format (v0.3.11) #82 reviewer): invoked /security-review while on PR feat: contributor mode, session awareness, universal RECOMMENDATION format (v0.3.11) #82's branch but received the diff for an unrelated branch the worktree had been flipped to mid-session. Reviewer correctly detected and refused the false-positive findings.
Hit 2 (PR feat(qa): add browse subcommand reference table to SKILL.md #88 reviewer): noted "/security-review skill not installed locally as a callable executable; performed manual security review per the rubric." Different surface symptom — same underlying class of bug.
Hit 3 (PR feat: contributor mode, session awareness, recommendation format #90 reviewer): flagged "/security-review ... NOT RUN. Skipped per known stale-git-context bug guidance." Bug is now actively avoided rather than tolerated.

What's NOT changed

/security-review (Claude Code built-in) has the exact same class of bug — cli.js ships the prompt git diff --name-only origin/HEAD... literally — but lives in Anthropic's binary and is out of gstack's reach. After this PR, /cso is a strictly safer alternative for security audits run from inside multi-agent sessions. The Claude Code bug is being filed separately upstream with Anthropic.
/ship, /codex, /document-release have similar bare-ref patterns. They are reachable from outside multi-agent reviews where the flip risk is lower; this PR leaves them on bare refs for back-compat. Adopting {{PR_DIFF_PIN}} there is one-line each in their .tmpl. Tracked as follow-up.
scripts/slop-diff.ts internally uses ${base}...HEAD and walks the working tree. Properly fixing it requires changing the script's interface to accept a head-SHA argument. Tracked as follow-up.

Test plan

bun test test/pr-diff-pin-regression.test.ts — 8 tests, ~6s, free tier
- Includes a real two-branch-fixture test that reproduces the failure mode end-to-end (proves the bug is real, then proves SHA-pinning fixes it)
- Smell tests catch regressions where someone re-introduces a bare-ref pattern into review/SKILL.md.tmpl or cso/SKILL.md.tmpl — both fenced bash blocks AND inline backtick-quoted commands
- Verified empirically: deliberately injecting Run \git diff origin/`` into review/SKILL.md.tmpl makes the smell test fail loudly
bun test test/skill-collision-sentinel.test.ts — confirms no skill-name collisions introduced
bun run gen:skill-docs --dry-run — confirms generated SKILL.md files are in sync with templates
Adversarial review by independent codex CLI at HIGH reasoning effort (per CLAUDE.md 全ファイル日本語化: スコープ整理と実施計画 #67) surfaced 7 substantive concerns; all addressed in commit 9404fa9a. Re-running the same adversarial framing on the fixed branch is recommended before merge.
End-to-end fresh-clone validation (per CLAUDE.md /plan-ceo-review stack in the planning mode and started implementing the code #66): gh repo clone <fork> -- --branch fix/... to a clean directory, bun install, run regression test, all pass.

Adversarial codex findings addressed

Severity	Finding	Resolution
P1	Resolver-emitted commands still used bare refs	Threaded `ctx.skillName === 'review'` through 6 affected resolvers; pinned-SHA forms inside `/review`, bare refs preserved for `/ship`
P2	Uncommitted local edits dropped in local-/review path	Added `$REVIEW_DIRTY` env var (default 0 in PR context, 1 for local pre-PR), Step 3 instructs combining `git diff "$BASE_SHA"` with `git diff "$HEAD_SHA"` when set
P2	Stale base ref via fetch + rev-parse	Now uses `gh pr view --json baseRefOid` directly (immutable) inside PR context
P2	Stale base ref when fetch fails	Warn explicitly + add `git cat-file -e` probe for both pinned SHAs
P2	`gh pr diff` blessed as immutable but isn't	Replaced with explicit warning; recommend `gh api /repos/.../compare/$BASE_SHA...$HEAD_SHA` if user wants GitHub-rendered diff
P2	Smell test only checked fenced bash blocks	Extended to also catch inline backtick-quoted `git`/`gh`/`bun` imperatives
P2	`/cso` exit(1) on missing SHAs broke non-diff scope flags	Soft-fail with WARN; `/review` re-asserts requirement with explicit check

Files changed

CHANGELOG.md                        |  55 ++
VERSION                             |   2 +-
package.json                        |   2 +-
scripts/resolvers/utility.ts        |  ~140 (+ generatePrDiffPin)
scripts/resolvers/index.ts          |   3
scripts/resolvers/review.ts         |  51 ± (skillName-conditional pinned vs bare)
scripts/resolvers/review-army.ts    |  18 ± (skillName-conditional pinned vs bare)
review/SKILL.md.tmpl                |  29 ± (Step 0.5 inclusion + Step 1/3 rewrite)
review/SKILL.md                     | 108 ± (regenerated)
review/checklist.md                 |   4 (doc reference to pinned form)
review/greptile-triage.md           |   2 (doc reference to pinned form)
cso/SKILL.md.tmpl                   |  13 ± (Step 0.5 inclusion + Phase 2 diff-mode rewrite)
cso/SKILL.md                        | 129 ± (regenerated)
test/pr-diff-pin-regression.test.ts | 280 (new — 8 tests, ~6s, free tier)

Failure mode tracking

This change closes the failure mode named shared-checkout-branch-flip-during-review in CLAUDE.md tracking. The same name appears in Step 0.5 documentation and the regression test's docblock so future maintainers can grep for it.

Compatibility / breaking changes

None. The bare-ref forms are preserved for /ship and other skills that don't yet adopt {{PR_DIFF_PIN}}. Users running /review and /cso will see a new "Pinned review context:" log line at the start of the skill (Step 0.5), but the rest of the workflow is functionally identical except in the buggy worktree-flip case.

🤖 Generated with Claude Code

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

…mid-review Per gstack's bisect-commits style, this is the source-of-truth commit (template + resolver + test). The regenerated SKILL.md files land in the next commit. The CHANGELOG and VERSION bump land in the third. Bug: a long-running review skill that reads git state through symbolic refs (`HEAD`, `origin/<base>`, `origin/HEAD`) silently re-renders against whatever the working tree is checked out to at the moment each `git diff` runs — not against the branch the user asked to review. Inside Agent SDK sessions where nested subagents share the worktree, a stray `git checkout` anywhere in the call tree (e.g., to inspect a sibling branch's file) flips HEAD and never restores it. Every later diff command then targets the wrong branch, and the review reports findings on unrelated code. Empirical evidence (claude-teams-bot, 2026-05-04): three consecutive PR reviewers in one session observed the upstream Claude Code built-in `/security-review` rendering against the wrong branch. The same class of bug exists in gstack `/review` (uses `git diff origin/<base>`) and `/cso` `--diff` mode (uses `git log <base>..HEAD`). Fix: a new `{{PR_DIFF_PIN}}` template fragment (Step 0.5 of the skill) resolves `BASE_SHA` and `HEAD_SHA` to immutable commit SHAs at the start of the skill, via `gh pr view --json headRefOid` in PR context or `git rev-parse origin/<base>` plus local `HEAD` otherwise. Every later `git diff`, `git log`, and `git show` invocation is rewritten to use those SHAs by value. SHAs are immutable across worktree flips. Out of scope (left for follow-up PRs once per-skill verification is done): `ship`, `codex`, and `document-release` have similar bare-ref patterns but are reachable from outside multi-agent reviews where the flip risk is lower. Adopting `{{PR_DIFF_PIN}}` there is one line each. Not addressed: the upstream Claude Code `/security-review` built-in has the same bug (`git diff --name-only origin/HEAD...`) but lives in `cli.js` and is out of gstack's reach. After this change, gstack `/cso` is a strictly safer alternative for security audits run from inside multi-agent sessions. Test: `test/pr-diff-pin-regression.test.ts` builds a real two-branch fixture, simulates the worktree flip, and asserts (a) bare `git diff main` reports the wrong branch (proving the failure mode is real), (b) `git diff "$BASE_SHA" "$HEAD_SHA"` reports the right branch (proving the fix works), (c) the same holds for `git log` and `git show`, and (d) the templates use `{{PR_DIFF_PIN}}` and SHAs in imperative bash blocks. 8 tests, ~6s, free tier. Refs: claude-teams-bot CLAUDE.md §58 (stale-git-context bug), §62-§68 (build-discipline gates), failure-mode name `shared-checkout-branch-flip-during-review`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mechanical regeneration via `bun run gen:skill-docs`. No manual edits. Mirrors the previous commit's template changes into the rendered files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bumps to v1.26.3.0 (PATCH on top of v1.26.2.0). Single-skill bug fix plus regression test plus reusable resolver. Net diff ~660 lines, no new user-facing capability beyond the safety guarantee, fits PATCH guidance per CLAUDE.md commit-style rules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Independent codex review (HIGH effort, adversarial mindset per CLAUDE.md garrytan#67) surfaced 7 substantive concerns against the initial fix. This commit addresses all of them. Findings: P1 — Resolver-emitted commands still used bare refs. /review imports several resolver-generated sections (SCOPE_DRIFT, PLAN_COMPLETION_AUDIT_REVIEW, ADVERSARIAL_STEP, REVIEW_ARMY) whose templates emitted `git diff origin/<base>` and `git log origin/<base>..HEAD` unchanged. Even with PR_DIFF_PIN at the top of the skill, those later sections would re-render against the working tree if a subagent flipped it. Fix: thread `ctx.skillName === 'review'` through the affected resolvers (`generateScopeDrift`, `generatePlanCompletionAuditInner`, `generateAdversarialStep`, `generateSpecialistSelection`, `generateSpecialistDispatch`, `generateRedTeam`) and emit pinned-SHA forms inside `/review` while leaving `/ship` on the bare-ref form for back-compat (ship doesn't yet adopt PR_DIFF_PIN). P2 — Uncommitted local edits dropped in the local-/review path. Pre-fix `/review` ran `git diff origin/<base>` against the working tree, which included uncommitted edits — a real workflow for users who run /review before /ship creates the PR. Fix: introduce `$REVIEW_DIRTY` env var (default 0 in PR context, 1 for local pre-PR). When 1, Step 3 instructs the agent to combine `git diff "$BASE_SHA"` (committed, pinned) with `git diff "$HEAD_SHA"` (uncommitted, by necessity working-tree-dependent — the user's caller knows when they're in dirty mode and reads it once at the start). P2 — Use baseRefOid from gh pr view directly (immutable) instead of re-resolving via fetch + rev-parse origin/<base>. `gh pr view` returns both `baseRefOid` and `headRefOid` — pinning to those is strictly more correct than `git rev-parse origin/<base>`, which can be stale if the local clone hasn't fetched recently. Fix: read baseRefOid from the same `gh pr view` JSON we already fetch. Fall back to `git rev-parse origin/<base>` only outside PR context. P2 — Stale base ref when fetch fails. In non-PR fallback, `git fetch origin "$BASE_BRANCH"` is wrapped in `|| true` and silently pins whatever local tracking ref exists. Fix: warn explicitly when the fetch fails, surface what was pinned. Also add a `git cat-file -e` probe for both pinned SHAs and warn when either is not present in the local object store (catches missing fetch-of-PR-head + missing-pull-ref scenarios). P2 — Don't bless `gh pr diff "$PR_NUMBER"` as immutable. Initial fix recommended `gh pr diff` as an alternative; codex pointed out that it re-resolves HEAD and BASE server-side at every call, so a force-pushed PR head or fast-forwarded base mid-review changes its output. Fix: replace the recommendation with an explicit warning. If the user genuinely wants the GitHub-rendered diff, point them at `gh api /repos/.../compare/$BASE_SHA...$HEAD_SHA` which IS SHA-pinned. P2 — Smell test only checks fenced bash blocks, missing inline imperatives. Step 3 of the original /review used inline backtick-wrapped commands ("Run `git diff origin/<base>` to ..."). The smell test in pr-diff-pin-regression.test.ts only inspected fenced bash blocks, so a future regression that puts a bare ref inside backticks in narrative prose would slip through. Fix: extend imperativeBashCommands() to also pull inline backtick-wrapped `git`/`gh`/`bun` commands. Skip don't/do comparison rows and explicit "**Don't**" labels via heuristic, with comments calling out each filtering rule. Verified empirically: a deliberately-injected `Run \`git diff origin/<base>\`` inline string now fails the test loudly. P2 — Don't require PR pinning for non-diff /cso modes. Original fix exited(1) when SHAs couldn't be resolved, which broke /cso --infra, /cso --supply-chain, /cso --owasp (these don't need a diff). Fix: soft-fail with WARN. /review re-asserts the requirement with an explicit `[ -n "$BASE_SHA" ] && [ -n "$HEAD_SHA" ]` check at Step 1; /cso scope-flag modes degrade gracefully. P3 — Pin slop scan head as well. Acknowledged but out of scope. `scripts/slop-diff.ts` internally uses `${base}...HEAD`, computes merge-base from working-tree HEAD, and walks the worktree directly. Properly fixing it requires changing the slop-diff.ts internals to accept a head-SHA argument. Tracked as follow-up; current change passes `$BASE_SHA` to slop-diff which still drifts on its head side but at least pins the base. Tests still pass (8/8 in pr-diff-pin-regression). The deliberate-bad- inline-pattern smoke test confirms the smell test catches the new class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mechanical regeneration via `bun run gen:skill-docs`. No manual edits. Mirrors the previous commit's resolver+template changes into the rendered files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

habassa5 and others added 5 commits May 4, 2026 04:48

chore(skill-docs): regenerate review and cso SKILL.md from templates

c28685d

Mechanical regeneration via `bun run gen:skill-docs`. No manual edits. Mirrors the previous commit's template changes into the rendered files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

habassa5 mentioned this pull request May 4, 2026

[BUG] /security-review skill renders stale git context (uses local HEAD instead of PR refs) anthropics/claude-code#56124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(review,cso): SHA-pin PR diff to prevent worktree-flip-during-review#1317

fix(review,cso): SHA-pin PR diff to prevent worktree-flip-during-review#1317
habassa5 wants to merge 5 commits intogarrytan:mainfrom
habassa5:fix/security-review-stale-git-context

habassa5 commented May 4, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

habassa5 commented May 4, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The bug, by example

Empirical evidence (from the downstream project that surfaced this)

What's NOT changed

Test plan

Adversarial codex findings addressed

Files changed

Failure mode tracking

Compatibility / breaking changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

habassa5 commented May 4, 2026 •

edited by blacksmith-sh Bot

Loading