Skip to content

fix(deploy-bump): scope git add to edited files, drop soft reset#290

Open
markusrubey wants to merge 2 commits into
mainfrom
fix/deploy-bump-scope-helm-add
Open

fix(deploy-bump): scope git add to edited files, drop soft reset#290
markusrubey wants to merge 2 commits into
mainfrom
fix/deploy-bump-scope-helm-add

Conversation

@markusrubey
Copy link
Copy Markdown
Member

@markusrubey markusrubey commented May 20, 2026

Summary

The Commit and push with automatic retry step in both component-deploy.yml (V1) and component-deploy-v2.yml had two defects that let unrelated working-tree state land on a target repo's main, branded as a docker-tag bump.

Defect 1 — repo-wide git add .

git add . from the repo root stages every dirty file on the runner. The sed step only edits one or two specific files, but the add scoops up everything: stale state on a self-hosted runner, side effects from earlier steps, anything left over in the workspace. The commit message says "Bump docker tag for X" but the commit can contain arbitrary diffs.

Defect 2 — rebase-conflict recovery does a soft reset + git add .

When the retry loop's git rebase origin/main conflicts, the recovery path did:

git rebase --abort
git reset --soft origin/main
# re-apply edits
git add .
git commit -m "Bump docker tag …"

git reset --soft moves HEAD to the new base but leaves the index pointing at the old tree. The subsequent git add . reads the working tree (still at the old base + our edit) and stages it. The resulting commit's diff vs the new base is:

diff(new base → old tree) + the edits

i.e. it silently reverts every change that landed between the old and new base.

Fix

Applied to both component-deploy.yml (V1) and component-deploy-v2.yml:

  • Replace git add . with an explicit add of the files the earlier steps edit:
    • V1: git add "$APP_PATH/values.yaml" "$CLUSTER_PATH/config.yaml"
    • V2: git add "$HELM_VALUES_PATH/values.yaml"
  • Replace git reset --soft origin/main with git reset --hard origin/main. We re-apply the edits from scratch anyway, so a hard reset is what was meant; it also wipes any leftover state from the failed rebase.
  • Hoist the duplicated commit message into a COMMIT_MSG env var so the two call sites can't drift.

Why both files

V1 (component-deploy.yml) is consumed by deploy-kotlin.yml, deploy-python.yml, and deploy-generic.yml. That covers all the Monta Kotlin services (server, service-api-gateways, service-core-api, service-search, service-feature, service-wallet, service-identity, etc.) and Python services. V2 covers the monorepo-typescript apps. Both had the identical bug.

Incident

monta-app/monorepo-typescript@716f5f7e3 — a Bump docker tag for react-storybook on staging commit also contained two single-character TS edits in apps/hub/src/domain/dashboard/feature/control-tower/…, breaking apps/hub typecheck on main. TS revert: monta-app/monorepo-typescript#2990.

Audited all 176 GitHub Action-authored commits on monorepo-typescript's main since 2025-09 — that's the only one with non-helm files. This fix prevents recurrence regardless of how the runner's working tree gets dirtied.

Test plan

  • Diff inspected — only touches the two bump steps
  • V1 workflow runs on next Kotlin/Python deploy without regression
  • V2 workflow runs on next monorepo-typescript deploy without regression
  • No follow-up corrupted-bump commits appear on consumer repos / kube-manifests

… reset

The "Bump docker tag" commit step had two defects that allowed unrelated
working-tree state to land on main, branded as a docker-tag bump:

1. `git add .` from the repo root stages every dirty file on the runner,
   not just the helm `values.yaml` we sed-edit. Anything that ends up in
   the workspace (stale state on a self-hosted runner, side effects from
   earlier steps) gets swept into the commit.
2. The rebase-conflict recovery did `git reset --soft origin/main` and
   then `git add .` again. Soft reset leaves the index pointing at the
   old tree, so the resulting commit's diff vs the new base is
   `diff(new base → old tree) + sed edits`, which silently reverts any
   commits that landed between the bases.

Replace both `git add .` with `git add "$HELM_VALUES_PATH/values.yaml"`
and the soft reset with a hard reset, then re-apply the seds on top of
fresh main. Result: bump commits can only ever touch the single helm
file the step is supposed to touch.

Incident: monta-app/monorepo-typescript@716f5f7e3 committed two
single-character TS edits alongside a react-storybook bump, leaving
`apps/hub` failing to typecheck on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@markusrubey markusrubey requested a review from a team as a code owner May 20, 2026 13:03
@markusrubey markusrubey requested review from sudheer-monta and removed request for a team May 20, 2026 13:03
….yml

`component-deploy.yml` (the V1 reusable workflow consumed by
`deploy-kotlin.yml`, `deploy-python.yml`, and `deploy-generic.yml`)
contains the same bug as V2: repo-wide `git add .` plus a `git reset --soft`
conflict-recovery path. V1 deploys against `kube-manifests` so the blast
radius differs, but the structural defect is the same.

Mirror the V2 fix:
- Scope both `git add` calls to the two files the earlier steps edit
  (`$APP_PATH/values.yaml` and `$CLUSTER_PATH/config.yaml`).
- Replace `git reset --soft origin/main` with `git reset --hard origin/main`.

Also hoist the duplicated commit message into a `COMMIT_MSG` env var in
both V1 and V2 to avoid drift across the four interpolated call sites,
and trim the comments to the invariant (drop the incident-SHA narration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@markusrubey markusrubey changed the title fix(deploy-generic-v2): scope helm bump add to values.yaml, drop soft reset fix(deploy-bump): scope git add to edited files, drop soft reset (v1 + v2) May 20, 2026
@markusrubey markusrubey changed the title fix(deploy-bump): scope git add to edited files, drop soft reset (v1 + v2) fix(deploy-bump): scope git add to edited files, drop soft reset May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant