refactor(secrets): stop SOPS-seeding non-bootstrap secrets, feed them via OpenBao#2034
Merged
Conversation
… via OpenBao SOPS now seeds bootstrap-critical values only. Upstream credentials whose consumers can safely wait are user-fed into OpenBao once and persist via the raft snapshot mirror: - Remove the seed-cloudflare and seed-fleetdm PushSecrets; document the bao kv put paths in the push-secrets.yaml header and DR runbook. - Prune cloudflare_api_token plus the dead r2_* placeholder keys from variables-base, and fleetdm_license_key from both variables-cluster files (sops unset; values never decrypted). - Move external-dns from the infrastructure-controllers layer to the hetzner infrastructure layer and source its Cloudflare token from an OpenBao ExternalSecret (the same KV entry cert-manager DNS01 reads) instead of Flux substitution; in the wait-gated controllers layer the ExternalSecret would deadlock before the ClusterSecretStore exists. - Drop the now-unused variables-base Secret substituteFrom entries; the Secret remains seed-only (ghcr_dockerconfigjson stays SOPS-seeded: Kyverno image verification of the private ksail-operator image must work in ephemeral CI clusters where no operator can feed the vault). - Truth up the R2 rotation docs: the live creds are per-environment in variables-cluster, not the (now removed) stale base placeholders. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
|
🎉 This PR is included in version 1.56.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the SOPS bootstrap-only policy: SOPS seeds only what the cluster needs to boot and recover; everything else is fed into OpenBao by an operator (upstream tokens) or by ESO generators (random values), persisting via the raft snapshot mirror (#1996).
Removed from SOPS → user-fed OpenBao writes
cloudflare_api_tokenbao kv put secret/infrastructure/dns/cloudflare api_token=…fleetdm_license_keybao kv put secret/apps/fleetdm/license license-key=…r2_*pairvariables-cluster)Keys were pruned with
sops unset(values never decrypted). Theseed-cloudflare/seed-fleetdmPushSecrets are removed; PushSecret's defaultdeletionPolicy: Nonemeans the existing prod KV entries persist, so nothing breaks on merge — the values just stop being repo-managed.external-dns: controllers → infrastructure layer
Its token now arrives via an OpenBao ExternalSecret (same KV entry as the cert-manager solver). In the wait-gated
infrastructure-controllerslayer that ExternalSecret would deadlock bootstrap (theopenbaoClusterSecretStore is aninfrastructure-layer resource), so the whole dir moves — the OpenCost precedent. external-dns is not bootstrap-critical: existing DNS records keep serving while it waits.Deliberately KEPT in SOPS (bootstrap seeds)
hcloud_token, per-envr2_*(DR chicken-and-egg: they make the BSL available before any restore can run)dex/flux-web/oauth2-proxy/githubclient secrets — consumed via Flux substitution by controllers-layer workloads; post-2026-06-10 single-source design)alertmanager_heartbeat_url(the one ungated external dead-man) andalertmanager_webhook_url(inline Coroot CR field, no secretRef support)ghcr_dockerconfigjson— reclassified as bootstrap during implementation: theverify-image-signaturesClusterPolicy needs the kyvernoghcr-authSecret to fetch signature manifests of the privateksail-operatorimage in every cluster, including ephemeral CI ones where no operator exists to feed the vault (verified: anonymous GHCR token grant fordevantler-tech/ksail-operatoris DENIED). Its now-unusedsubstituteFromentries are dropped; the Secret is seed-only.Docs truth-up
variables-clusterfiles (+ fully-qualifiedbackups.velero.io).Rollout notes
policy: sync, txt registry), worst case a few minutes of paused reconciliation.bao kv putwrites before cert-manager DNS01 / external-dns / fleetdm go green (documented in runbook 4b). With the feat(openbao): mirror raft snapshots off-cluster and auto-restore from them #1996 mirror restored, no manual steps.seed-github-apppushesinfrastructure/oidc/githubwhich nothing currently reads back — kept as a durability mirror of a bootstrap-seeded value.Validation
ksail workload validate— ✅ 305 files (includes all moved external-dns manifests)ksail --config ksail.prod.yaml workload validate— sole failure is the known pre-existing corootnotificationIntegrationsschema gap (upstream Update coroot.com/coroot_v1 schema from coroot-operator 0.9.7 datreeio/CRDs-catalog#896), unrelatedkubectl kustomize— local, prod, hetzner infra, hetzner controllers, docker infra all buildcloudflare_api_token/fleetdm_license_key/seed-cloudflare/seed-fleetdm🤖 Generated with Claude Code