Merge-queue deploys leave prod running unmerged code when the merge group fails

> 🤖 Generated by the Daily AI Assistant

## What happened today (2026-06-11)

PR #1991 entered the merge queue. `ci.yaml`'s `deploy-prod` job (which runs on `merge_group`, i.e. on the **speculative merge ref, before the PR actually merges**) pushed the OCI artifact and reconciled prod at ~16:08. The merge group then failed → the PR was ejected, **but prod kept reconciling the unmerged artifact**: the kyverno HelmRelease picked up #1991's then-broken PDB values, the upgrade failed every interval (`Cannot set both .minAvailable and .maxUnavailable`), and the whole `infrastructure-controllers → infrastructure → apps` chain sat blocked for hours. Nothing on `main` reflected the state prod was running.

## The general flaw

`deploy-prod` on `merge_group` is a deliberate and useful gate (a deploy that fails prevents the merge). The gap is the **failure path**: any merge-group failure *after* `workload push` leaves `latest` pointing at code that never landed on `main`, until some future merge overwrites it. Prod state silently diverges from Git — the core thing GitOps is supposed to prevent.

## Options (maintainer decision)

1. **Heal on failure** (keeps the gate): add an `if: failure()` job/step to the merge-group run that re-checkouts `main` and re-runs `ksail --config ksail.prod.yaml workload push` + `reconcile`, so an ejected PR's artifact is immediately replaced by main's.
2. **Tag/push artifacts by SHA + promote on merge**: push the merge-group artifact under a non-`latest` tag, and only re-tag to `latest` after the merge actually completes (needs a small post-merge workflow).
3. **Accept the risk** (status quo): the next successful merge self-heals; document the failure mode in AGENTS.md / the DR runbook so the on-call knows that a red merge queue can mean prod is running unmerged code.

Option 1 is the smallest change that closes the gap without giving up the pre-merge deploy gate. Happy to PR whichever direction is preferred.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge-queue deploys leave prod running unmerged code when the merge group fails #2026

What happened today (2026-06-11)

The general flaw

Options (maintainer decision)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Merge-queue deploys leave prod running unmerged code when the merge group fails #2026

Description

What happened today (2026-06-11)

The general flaw

Options (maintainer decision)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions