Skip to content

fix(cluster-policies): stop flagging Longhorn's operator-managed PDBs as drain-unsafe#2025

Merged
devantler merged 2 commits into
mainfrom
claude/repo-assist-pdb-policy-exclude-longhorn
Jun 11, 2026
Merged

fix(cluster-policies): stop flagging Longhorn's operator-managed PDBs as drain-unsafe#2025
devantler merged 2 commits into
mainfrom
claude/repo-assist-pdb-policy-exclude-longhorn

Conversation

@devantler

Copy link
Copy Markdown
Contributor

🤖 Generated by the Daily AI Assistant

Symptom

validate-pdb-drain-safe reports 15 permanent Audit failures in longhorn-system (13× instance-manager-*, csi-attacher, csi-provisioner) — contradicting the policy's own design note that the cluster reports zero violations at steady state, and burying any real future violation in noise.

Root cause

These PDBs are created by longhorn-manager itself: a minAvailable: 1 PDB per instance-manager that still hosts replicas/engines (deleted once the node is safe to drain), plus PDBs for its CSI sidecars. The minAvailable shape is the operator's deliberate eviction interlock — there is no chart value to flip, and rewriting them would fight the operator (same category as the existing CNPG exclusion).

Fix

Exclude them in the policy, the same way CNPG-managed PDBs already are. Longhorn's PDBs carry no labels or ownerReferences (verified live), so the exclusion keys on the longhorn-system namespace plus the operator's fixed names (instance-manager-?*, csi-attacher, csi-provisioner). Namespace+names within one exclude block AND together, so PDBs of the same names elsewhere are still validated.

Remaining flagged PDBs after this: the 3 kyverno controller PDBs, which #1991 flips to maxUnavailable — together these return the policy to its documented zero-violation steady state.

Validation

  • kubectl kustomize k8s/clusters/local/ + k8s/clusters/prod/ — ✅ both build
  • Server-side dry-run of the ClusterPolicy against prod — ✅ accepted

… as drain-unsafe

longhorn-manager creates a minAvailable: 1 PDB per instance-manager that
still hosts replicas or engines — and deletes it once the node is safe
to drain — plus PDBs for csi-attacher/csi-provisioner. That minAvailable
shape IS the operator's eviction interlock, not a chart value anyone can
flip, yet validate-pdb-drain-safe Audit-flags all 15 of them on prod,
permanently. The policy header even promises 'zero violations' at steady
state. Exclude them the same way CNPG-managed PDBs already are; Longhorn's
PDBs carry no labels or ownerReferences, so the exclusion keys on the
longhorn-system namespace plus the operator's fixed names.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@devantler devantler added this pull request to the merge queue Jun 11, 2026
Merged via the queue into main with commit 67a095f Jun 11, 2026
10 checks passed
@devantler devantler deleted the claude/repo-assist-pdb-policy-exclude-longhorn branch June 11, 2026 19:37
@github-project-automation github-project-automation Bot moved this from 🫴 Ready to ✅ Done in 🌊 Project Board Jun 11, 2026
@botantler

botantler Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 1.52.3 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

1 participant