fix(cluster-policies): stop flagging Longhorn's operator-managed PDBs as drain-unsafe#2025
Merged
Merged
Conversation
… as drain-unsafe longhorn-manager creates a minAvailable: 1 PDB per instance-manager that still hosts replicas or engines — and deletes it once the node is safe to drain — plus PDBs for csi-attacher/csi-provisioner. That minAvailable shape IS the operator's eviction interlock, not a chart value anyone can flip, yet validate-pdb-drain-safe Audit-flags all 15 of them on prod, permanently. The policy header even promises 'zero violations' at steady state. Exclude them the same way CNPG-managed PDBs already are; Longhorn's PDBs carry no labels or ownerReferences, so the exclusion keys on the longhorn-system namespace plus the operator's fixed names. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
|
🎉 This PR is included in version 1.52.3 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
validate-pdb-drain-safereports 15 permanent Audit failures inlonghorn-system(13×instance-manager-*,csi-attacher,csi-provisioner) — contradicting the policy's own design note that the cluster reports zero violations at steady state, and burying any real future violation in noise.Root cause
These PDBs are created by longhorn-manager itself: a
minAvailable: 1PDB per instance-manager that still hosts replicas/engines (deleted once the node is safe to drain), plus PDBs for its CSI sidecars. TheminAvailableshape is the operator's deliberate eviction interlock — there is no chart value to flip, and rewriting them would fight the operator (same category as the existing CNPG exclusion).Fix
Exclude them in the policy, the same way CNPG-managed PDBs already are. Longhorn's PDBs carry no labels or ownerReferences (verified live), so the exclusion keys on the
longhorn-systemnamespace plus the operator's fixed names (instance-manager-?*,csi-attacher,csi-provisioner). Namespace+names within one exclude block AND together, so PDBs of the same names elsewhere are still validated.Remaining flagged PDBs after this: the 3 kyverno controller PDBs, which #1991 flips to
maxUnavailable— together these return the policy to its documented zero-violation steady state.Validation
kubectl kustomize k8s/clusters/local/+k8s/clusters/prod/— ✅ both build