diff --git a/docs/dr/runbook.md b/docs/dr/runbook.md index 9743d4ae7..0b05e4eb3 100644 --- a/docs/dr/runbook.md +++ b/docs/dr/runbook.md @@ -130,6 +130,16 @@ ksail --config ksail.prod.yaml workload reconcile # Flux pulls and applies flux get kustomizations -A # Re-run if any are NotReady; expect convergence in 10-15 minutes +# 4b. ONLY if the OpenBao raft-snapshot recovery was impossible (no snapshot +# in R2 — the vault came up fresh): re-feed the user-fed secrets that +# SOPS deliberately does not seed (see the header of +# k8s/bases/infrastructure/vault-seed/push-secrets.yaml). Until then, +# cert-manager DNS01, external-dns, and fleetdm stay pending: +kubectl -n openbao exec openbao-0 -- \ + bao kv put secret/infrastructure/dns/cloudflare api_token= +kubectl -n openbao exec openbao-0 -- \ + bao kv put secret/apps/fleetdm/license license-key= + # 5. DNS — normally NO manual step: external-dns (hetzner overlay, # policy: sync, gateway-httproute source) repoints the Cloudflare # records at the new load balancer automatically once the HTTPRoutes @@ -268,22 +278,35 @@ find . -name '*.enc.yaml' -print0 | xargs -0 -n1 sops updatekeys --yes # bucket only). DO NOT revoke the old one # yet -- there is a window where both must work. -# 2. Update the encrypted secret in-place +# 2. Update the encrypted secret in-place. The R2 keys are per-environment +# and live in the CLUSTER secret (variables-cluster), not the shared base. sops --set '["stringData"]["r2_access_key_id"] ""' \ - k8s/bases/bootstrap/variables-base-secret.enc.yaml + k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml sops --set '["stringData"]["r2_secret_access_key"] ""' \ - k8s/bases/bootstrap/variables-base-secret.enc.yaml + k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml +# (repeat for k8s/clusters/local/bootstrap/ if rotating the local creds) -# 3. PR + merge. Flux propagates within one reconciliation cycle. +# 3. PR + merge. Flux propagates within one reconciliation cycle, and the +# hourly seed-r2-credentials PushSecret refreshes infrastructure/backup/r2 +# in OpenBao, from where the Velero/CNPG ExternalSecrets re-sync. # 4. Wait one Velero schedule + one CNPG WAL archive cycle to confirm # the new credentials work end-to-end. -kubectl -n velero get backups -w +kubectl -n velero get backups.velero.io -w kubectl logs -n cnpg-system -l app.kubernetes.io/name=cloudnative-pg --tail=50 # 5. Revoke the old token in Cloudflare. ``` +The **Cloudflare API token** (DNS01 + external-dns) is user-fed, not in SOPS — +rotate it with a single vault write; the consuming ExternalSecrets re-sync +within their 1h refresh interval: + +```bash +kubectl -n openbao exec openbao-0 -- \ + bao kv put secret/infrastructure/dns/cloudflare api_token= +``` + --- ## Encryption-at-rest verification diff --git a/docs/dr/velero-cnpg.md b/docs/dr/velero-cnpg.md index 12679c9cc..b9905b847 100644 --- a/docs/dr/velero-cnpg.md +++ b/docs/dr/velero-cnpg.md @@ -24,8 +24,11 @@ contents, and Postgres data. └──────────┘ └────────────────────┘ ``` -Credentials are SOPS-encrypted in `variables-base-secret.enc.yaml` and -substituted into both the Velero and CNPG secrets at Flux apply time. +Credentials are SOPS-encrypted per environment in the cluster secret +(`k8s/clusters//bootstrap/variables-cluster-secret.enc.yaml`), seeded +into OpenBao at `infrastructure/backup/r2` by the `seed-r2-credentials` +PushSecret, and materialised into the Velero and CNPG namespaces by +ExternalSecrets. ## Velero @@ -183,18 +186,20 @@ etc.) see [`runbook.md`](./runbook.md). ## Credential rotation -Stored in `k8s/bases/bootstrap/variables-base-secret.enc.yaml`. Rotation -flow: +Stored per environment in +`k8s/clusters//bootstrap/variables-cluster-secret.enc.yaml`. Rotation +flow (see also runbook.md Scenario 7): ```bash # 1. Mint a new R2 token in Cloudflare; revoke the old one only after step 4. # 2. Update both keys in-place with sops: sops --set '["stringData"]["r2_access_key_id"] ""' \ - k8s/bases/bootstrap/variables-base-secret.enc.yaml + k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml sops --set '["stringData"]["r2_secret_access_key"] ""' \ - k8s/bases/bootstrap/variables-base-secret.enc.yaml -# 3. PR + merge -> Flux reconciles the new Secret -> Velero/CNPG pick it up -# on next run (Velero re-reads the credentials secret per backup). + k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml +# 3. PR + merge -> Flux reconciles the new Secret -> the hourly +# seed-r2-credentials PushSecret refreshes OpenBao -> the Velero/CNPG +# ExternalSecrets re-sync within their refresh interval. # 4. Revoke the old token in Cloudflare. ``` diff --git a/k8s/bases/apps/fleetdm/README.md b/k8s/bases/apps/fleetdm/README.md index 18a864c95..20ad77448 100644 --- a/k8s/bases/apps/fleetdm/README.md +++ b/k8s/bases/apps/fleetdm/README.md @@ -31,25 +31,26 @@ allowing low-level access to OS-specific features through osquery. ## Secrets -Fleet's credentials reach the app through OpenBao + External Secrets. Only the -license key originates from SOPS; the database, cache, and MDM keys are -randomly generated by External Secrets `Password` generators -(`k8s/bases/infrastructure/vault-seed/`) and never stored in this repo. +Fleet's credentials reach the app through OpenBao + External Secrets. The +license key is user-fed straight into OpenBao (never stored in this repo); +the database, cache, and MDM keys are randomly generated by External Secrets +`Password` generators (`k8s/bases/infrastructure/vault-seed/`) and never +stored in this repo either. | Secret | Source | OpenBao path | | --- | --- | --- | -| `fleetdm_license_key` | **SOPS** — `variables-cluster-secret.enc.yaml`, seeded to OpenBao by the `seed-fleetdm` PushSecret. **Fleet premium license JWT**; the current value is a trial license expiring 2025-05-01 — replace before expiry or Fleet reverts to the free tier. | `apps/fleetdm/license` | +| License key | **User-fed** — an operator writes it to OpenBao (`bao kv put secret/apps/fleetdm/license license-key=`); it persists via the raft snapshot mirror. **Fleet premium license JWT** — replace before expiry or Fleet reverts to the free tier. | `apps/fleetdm/license` | | MDM server private key | **Auto-generated**, consumed via `server-key-external-secret.yaml`. **Must stay stable** — losing it invalidates every enrolled device, so its durability comes from OpenBao persistence (the Velero backup of the `openbao` PVC). | `apps/fleetdm/server` | | MySQL root / user / replication passwords | **Auto-generated**, consumed via `external-secrets.yaml`. | `apps/fleetdm/mysql` | | Redis password | **Auto-generated**. | `apps/fleetdm/redis` | ### Rotating -- **License key** (SOPS) — update and re-push: +- **License key** (user-fed) — write the new JWT to OpenBao; the consuming + ExternalSecret picks it up on its next refresh: ```sh - sops set k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml \ - '["stringData"]["fleetdm_license_key"]' "\"$NEW_JWT\"" - # then re-apply the seed-fleetdm PushSecret to update OpenBao + kubectl -n openbao exec openbao-0 -- \ + bao kv put secret/apps/fleetdm/license license-key="$NEW_JWT" ``` - **Generated secrets** (MySQL / Redis / MDM key) — rotate by deleting the generated source Secret (or its OpenBao entry) so External Secrets diff --git a/k8s/bases/bootstrap/variables-base-secret.enc.yaml b/k8s/bases/bootstrap/variables-base-secret.enc.yaml index 67c8938e2..83d79c436 100644 --- a/k8s/bases/bootstrap/variables-base-secret.enc.yaml +++ b/k8s/bases/bootstrap/variables-base-secret.enc.yaml @@ -4,9 +4,6 @@ metadata: name: variables-base namespace: flux-system stringData: - cloudflare_api_token: ENC[AES256_GCM,data:ADfRMMzGdOp9oT0wAMLCvlP4QA/3I2S7Ts5LcMF0VZnb4+7nTTFTMLWSwn1k27Fem87bhLA=,iv:narD4cOgsbo8VyEeT4X00wcPFigDSDSHUwAQKjhZABc=,tag:uFKv6ZiBOk03c0YUI2wTbQ==,type:str] - r2_access_key_id: ENC[AES256_GCM,data:SbBQvC8ex2M10JHV6ogBFkvN+13ZNQc5hS7U,iv:5k7bOd/q75c146+HvEUZGsVLPDNc5qDE/u2KFGfs/3Y=,tag:DKrOyTVC6R0AMfBhiCZFAg==,type:str] - r2_secret_access_key: ENC[AES256_GCM,data:PdhgQ1d1pDW1lLcXasYG2roEG8qapUHpHM/goE+V1w==,iv:g3QCozVlZMEBRfMhVvfSf5i8lYK+3hnFPJ+fEjSJz4s=,tag:UJL+z0bQ2C7f71XJSdWCvQ==,type:str] ghcr_dockerconfigjson: ENC[AES256_GCM,data:LBiwiL5q1+OBu633T/xz3ELEqbkqIBNKTnKlJ6q4ROD/8lkYSJWEt4J2brn+Kyd5O6XQy03J2G5k3VIVgC3KEWGSEW/HljEh+GVPGGZymO+SQeD37fsADFWI1HNGv4vAVikcQQ==,iv:TuG5peH/B6rpLy0nFhMK+o0LvgRpGn7rHq3oeCaotUw=,tag:wAyFHu1fWg+dQ//0PyBH0w==,type:str] sops: age: @@ -29,6 +26,6 @@ sops: -----END AGE ENCRYPTED FILE----- recipient: age1rk6fs67kly3h4zux5za429z3grjtvs8vcunav4sa28pxk738najsk8wgs6 encrypted_regex: ^(data|stringData)$ - lastmodified: "2026-06-08T19:26:46Z" - mac: ENC[AES256_GCM,data:qStAjbBUpp0D1Bm1gDFurL+fQHfAEVrPPLRirwJdrj67NN3jTIKlpO2c3e8AoH/GPtq9LymqFH9EpP0PoalGUd1FZYhUIY4AFse1b3UGvknsnuw3haeL1GvVTg44gr2ZNwyBPfbPVIRASwB9MvH4xkyckyd2ZEH/Dz5G2itxrGA=,iv:2cQ7oqx/jD8YSUsTnlqdtAKuECrioMxyG3H8TvogMes=,tag:OEZ4JFuTFLCGEI3tJWE8ig==,type:str] + lastmodified: "2026-06-11T21:23:41Z" + mac: ENC[AES256_GCM,data:EAV7VkgbouKR3uRi6NAlWXIT+JnV3iBb0DTZ7T2zaxh6cbKuBxJX4YifO/lL+VtTq1PYOYXP1P9ellbQ7SdMkvVOI0AiXdwwXiePDu7lj7xXjhem7+PW+knXO+YcP5r4Q1V4YWjzZOc7Uw+YlEsVHdPe5uB6wBFYyhJmUlD/xv4=,iv:gnfPx4Hu0Hb4Sln4XL9sxY2W4iy9HTN1EuaU+XidvRw=,tag:hfVnv42/ueiPqs8chdzGBw==,type:str] version: 3.12.2 diff --git a/k8s/bases/infrastructure/vault-seed/push-secrets.yaml b/k8s/bases/infrastructure/vault-seed/push-secrets.yaml index 6ce05a63c..0cdda617a 100644 --- a/k8s/bases/infrastructure/vault-seed/push-secrets.yaml +++ b/k8s/bases/infrastructure/vault-seed/push-secrets.yaml @@ -2,6 +2,16 @@ # These push externally-sourced credentials that cannot be randomly generated. # Randomly-generatable secrets are handled by push-generated-secrets.yaml. # +# SOPS seeds BOOTSTRAP-CRITICAL values only. Upstream credentials whose +# consumers can safely wait (non-bootstrap workloads) are NOT seeded here — +# an operator writes them to OpenBao once, and they persist via the raft +# snapshot mirror (vault-backup/): +# bao kv put secret/infrastructure/dns/cloudflare api_token= +# bao kv put secret/apps/fleetdm/license license-key= +# On a from-zero rebuild WITHOUT a vault restore, cert-manager DNS01, +# external-dns, and fleetdm stay pending until those writes happen — see +# docs/dr/runbook.md scenario 4. +# # refreshInterval 1h (not "0" = push-once): the source Secrets are durable # (SOPS-decrypted by Flux, or generated in-cluster) and authoritative, so a # periodic re-push is always a no-op while OpenBao is healthy — and it is @@ -22,10 +32,11 @@ spec: kind: ClusterSecretStore selector: secret: - # R2 backup credentials are per-environment and maintained in the - # cluster-scoped secret (variables-cluster), not the shared base. The - # base copy is a stale placeholder (a truncated 27-char access_key_id) - # that the BSL rejects with "access key has length 27, should be 32". + # R2 backup credentials are per-environment and live in the + # cluster-scoped secret (variables-cluster) only. They are + # bootstrap-critical: on a rebuild they are what makes Velero's + # BackupStorageLocation available before any restore can run, so they + # cannot themselves be user-fed via OpenBao. name: variables-cluster data: - match: @@ -41,46 +52,6 @@ spec: --- apiVersion: external-secrets.io/v1alpha1 kind: PushSecret -metadata: - name: seed-cloudflare - namespace: flux-system -spec: - refreshInterval: 1h - secretStoreRefs: - - name: openbao - kind: ClusterSecretStore - selector: - secret: - name: variables-base - data: - - match: - secretKey: cloudflare_api_token - remoteRef: - remoteKey: infrastructure/dns/cloudflare - property: api_token ---- -apiVersion: external-secrets.io/v1alpha1 -kind: PushSecret -metadata: - name: seed-fleetdm - namespace: flux-system -spec: - refreshInterval: 1h - secretStoreRefs: - - name: openbao - kind: ClusterSecretStore - selector: - secret: - name: variables-cluster - data: - - match: - secretKey: fleetdm_license_key - remoteRef: - remoteKey: apps/fleetdm/license - property: license-key ---- -apiVersion: external-secrets.io/v1alpha1 -kind: PushSecret metadata: name: seed-github-app namespace: flux-system @@ -111,6 +82,13 @@ spec: # cluster-independent — the same org token both clusters use) and holds the full # `{"auths":{"ghcr.io":{...}}}` document. Re-pushed hourly, so a rotated # token (or a re-initialized vault) converges without manual re-apply. +# +# This one is deliberately SOPS-seeded, NOT user-fed, even though it is an +# upstream token: the verify-image-signatures ClusterPolicy needs the kyverno +# `ghcr-auth` Secret to fetch signature manifests of PRIVATE first-party +# images (ksail-operator runs in EVERY cluster, including ephemeral CI ones +# where no operator exists to feed the vault) — without it Kyverno denies +# those pods at admission. That makes it bootstrap machinery. apiVersion: external-secrets.io/v1alpha1 kind: PushSecret metadata: diff --git a/k8s/clusters/base/apps-flux-kustomization.yaml b/k8s/clusters/base/apps-flux-kustomization.yaml index 32b8623e5..937f325f8 100644 --- a/k8s/clusters/base/apps-flux-kustomization.yaml +++ b/k8s/clusters/base/apps-flux-kustomization.yaml @@ -22,10 +22,11 @@ spec: substituteFrom: # Per-Flux docs, later entries override earlier ones, so the base # (defaults) come first and the per-cluster overrides come last. + # The variables-base SECRET is deliberately absent: its only remaining + # key (ghcr_dockerconfigjson) is seed-only — no manifest substitutes + # from it. - kind: ConfigMap name: variables-base - - kind: Secret - name: variables-base - kind: ConfigMap name: variables-cluster - kind: Secret diff --git a/k8s/clusters/base/infrastructure-controllers-flux-kustomization.yaml b/k8s/clusters/base/infrastructure-controllers-flux-kustomization.yaml index 47456396c..814838873 100644 --- a/k8s/clusters/base/infrastructure-controllers-flux-kustomization.yaml +++ b/k8s/clusters/base/infrastructure-controllers-flux-kustomization.yaml @@ -26,10 +26,11 @@ spec: substituteFrom: # Per-Flux docs, later entries override earlier ones, so the base # (defaults) come first and the per-cluster overrides come last. + # The variables-base SECRET is deliberately absent: its only remaining + # key (ghcr_dockerconfigjson) is seed-only — no manifest substitutes + # from it. - kind: ConfigMap name: variables-base - - kind: Secret - name: variables-base - kind: ConfigMap name: variables-cluster - kind: Secret diff --git a/k8s/clusters/base/infrastructure-flux-kustomization.yaml b/k8s/clusters/base/infrastructure-flux-kustomization.yaml index 1b54df73b..23b0af032 100644 --- a/k8s/clusters/base/infrastructure-flux-kustomization.yaml +++ b/k8s/clusters/base/infrastructure-flux-kustomization.yaml @@ -22,10 +22,11 @@ spec: substituteFrom: # Per-Flux docs, later entries override earlier ones, so the base # (defaults) come first and the per-cluster overrides come last. + # The variables-base SECRET is deliberately absent: its only remaining + # key (ghcr_dockerconfigjson) is seed-only (vault-seed PushSecrets read + # the Secret object directly) — no manifest substitutes from it. - kind: ConfigMap name: variables-base - - kind: Secret - name: variables-base - kind: ConfigMap name: variables-cluster - kind: Secret diff --git a/k8s/clusters/local/bootstrap/variables-cluster-secret.enc.yaml b/k8s/clusters/local/bootstrap/variables-cluster-secret.enc.yaml index 5d9e2a869..f28b8c2ac 100644 --- a/k8s/clusters/local/bootstrap/variables-cluster-secret.enc.yaml +++ b/k8s/clusters/local/bootstrap/variables-cluster-secret.enc.yaml @@ -11,7 +11,6 @@ stringData: r2_access_key_id: ENC[AES256_GCM,data:pOvUgG4=,iv:dS/Ow1HUaI+zBoYVqiRHkNR55lXAd/8XtWYeYng3VQk=,tag:q1/tfyxZhbSj9/HWyaLelA==,type:str] r2_secret_access_key: ENC[AES256_GCM,data:OGQNek55/7wegBsPoEURPUDEJbibs645P8awiw==,iv:bhUMGzO0KH0BiZbJzbEsLM3r4koWL1aUSnrF7BW31zk=,tag:a068U05Pw4YvOupTVx0KcA==,type:str] alertmanager_webhook_url: ENC[AES256_GCM,data:WH2X/wjvJ47DPGkTkvbCadWAFDelXyKxd9Zqqy9i1cR3nIoS2K7dzwjR,iv:Jl6vnrBg0pandwv0CmHASDM4JRLfXFrL7EN78SxWHMA=,tag:Nr0loeKoXEeRuxqIAsXEuQ==,type:str] - fleetdm_license_key: ENC[AES256_GCM,data:+1E5ngW9DcFmes2WurDE4Lod0Hc+NxrLYo4aR43n92sNtPTexCjJIdp1NUZEcPA4q/bnC+Lv4JnXO78FBdYFHIcHXpoBIk8yBsSf/ACI9DfCSe+UGXv34ZNFa4Wq4rz358Oy64soB8m2YCNpK+jVzl2Zzr56wGbUo92tPou1Au1xE17zUN+oiFCxfBFWiNFU91wvgnn+ZI+nIgr8y/PB9dsX/cp+FRKHmuysXS4tN6atdQvhZ2TVvGrKpr0D/J52V5K4O0PcpMv2/HZKtKYAtq/3aTHJ5D6jmDPThKwIJooGbwl1Oj53Rn1nCzSA5pTvrJygnbKNvfkJQh3egGPtNbhQQ5WUhgqsPr0seKpoyTJhZcZrsejz/hX3WIi7cakrnCfeqFCICv2RAZACJ595f21Z7LpsJl0Ea/ZNEBOlN6MfZWlIBBLvvhvg5KUvr6uURXn4Q2gVq3qee3TtLg/zLWEVbJmgfg==,iv:NQNsTyKQy0LWumLchB7auN3/ru6qZ8bvry43vvdxAT4=,tag:Eu2niEcF3mw8z8yxUXbtyw==,type:str] sops: age: - enc: | @@ -24,6 +23,6 @@ sops: -----END AGE ENCRYPTED FILE----- recipient: age14skmde00w0ps6u8asm4rdqwpktr5m5pq84070yzwvjjmcyfnl4ushfn5uq encrypted_regex: ^(data|stringData)$ - lastmodified: "2026-06-08T19:51:48Z" - mac: ENC[AES256_GCM,data:/JDV9VMJklIU6KBR2TettlcyiOKcVO89nCXeXZ5UHapGr9kB+ah1sI1T8sd6HNwdojuM50nvBclAgiBeb5bBj7QdgW2rCT8IFmZcSO/kTP4/TQ16vu0qZxDhLv79nSMWB8vqR+Ge7LyVbwJ5vORzv1z6hDWhrfldg1T5g2yV734=,iv:PV6w4V/whj8W2vX3NjK9a1PBauCpfmIsy/8fM3XbGAU=,tag:Dz0FgFLZoEgAM4lEoVACoQ==,type:str] + lastmodified: "2026-06-11T21:23:41Z" + mac: ENC[AES256_GCM,data:TaLK4BGkHN+bzxwrxXWe63IEL/2Utq4ySpDXeMUV+NeLk4wKdY5Rxv/c3J5bC4jonuGvcAcxIRrTQntgdNFbtHmyJ8nREz4S1RBesDHWYzndrWanBYh5i9s1NbZ3v+rDN8sE/gehjjPXXZPxP11jyr0HlsJ4z6BkIH+DM60HimI=,iv:nQduidNYyNWP1Eb9tJCN+NgrL9Saz2Psm8FVT6VAlR0=,tag:WwcWBx4hVti/a4rwLBQ3Yw==,type:str] version: 3.12.2 diff --git a/k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml b/k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml index 270066086..dce1f31b9 100644 --- a/k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml +++ b/k8s/clusters/prod/bootstrap/variables-cluster-secret.enc.yaml @@ -6,7 +6,6 @@ metadata: stringData: alertmanager_webhook_url: ENC[AES256_GCM,data:drl/4hxpqwgmNQblAK3M6jaYNo4MNxhLT03IG300szKKg2dRU5+/peDu34/0IFndSxblTpP2omu+hijTRrQdDnCTKRTmM6IekFcuZEJ4LcaF,iv:Fsu1HcojhTA7sgxoE6eBGBF+HuwGemuU6PovlldiVfY=,tag:eByBYe+hojK7QZwue157yg==,type:str] dex_client_secret: ENC[AES256_GCM,data:itgiIfh1JgHp4PpFOq5Wv/OFzes=,iv:iA1CU0etaen6q0R5c7e785JpYYPXFw8PaaO9w7gFz/k=,tag:pHfORf7ZAv7D6sr+Jcx8mQ==,type:str] - fleetdm_license_key: ENC[AES256_GCM,data:3Hgszrs7ctkUl/z9w/robWLRIzGUl9Q2Oj8TQWX/Vzi+tHTA95fZT6Xbj3rmS0sUBT/TM33zPPelkxdR25ArWYw6/02dl/ABRyDoYrCi3X56goH8C5tdcDzVNDSqKuNZiH+G4WTmsIGM8oODlLltpQ6IA0Y1hIk2X1Ige9OZ9/LzhvZ6AOQw2Ti9VTCCS91oqoYJ0klYlLoE0C28tQnBzbNPXpW+fuGI6Fqux53fxpoLs9yV1JwsSauePf1xVdcyLxSezwp55L7ZSdNFBn9KPFOGS8W7xKQoeqHZdAj5R7nOuEcTotV7Cf/3fWVt/ngP3XofsZyOjfAD57FSKM9jF1lxcDGmPJxaEKsUBYF57Uzp8lqyzJ57m7XswaGn2ES1sI13J4jmQWWoK2HISczvGY2EnEb0+vGGwELZ3Oq0qO32V7pQJyUEQy+The2bgfzFtaw9QB8DczVr/ZNgOPC+MguSVuEQvg==,iv:p+UiZpSojLyqATGm9S3aFMBHmDAkTgF4Jcg3WJ48h2Y=,tag:RVYnSyyCg0rvdEHbSKs7FA==,type:str] flux_web_client_secret: ENC[AES256_GCM,data:Hc1yMBdyzLzk408jAkdW8nv9T0wWwTHdXmmFWH+IqM0=,iv:MPwHVJrP2Ls0tQiEfiafE7BxAUw9s5SYgiQ4YJncgp4=,tag:gH1ZM/WhhDyylegAY6iwTw==,type:str] github_app_client_secret: ENC[AES256_GCM,data:los+HdSkTCGLpSGIX5+xabAdGr0pqdDry01SLXQ9bW7AJBwtEiIueQ==,iv:R89bwVRu793CMobnU0zsv36X3AFx489l3P7CjU3tjag=,tag:Qg5vVOHqQJq3K1jJlJQDZw==,type:str] hcloud_token: ENC[AES256_GCM,data:n/i/NzWeV5suvbqZxTUSvQRX6Sw8QFD2rQKwD/yws74mnZT9LjlcJzrCp1JybNC6DlgM63HB5iFq/W1JOfycdQ==,iv:c+iB2Q7/oRTNYz44AQvCb4gzm56eFLovFebTSKV1fV8=,tag:2py9ltwClmXouDgesG5zIA==,type:str] @@ -26,6 +25,6 @@ sops: -----END AGE ENCRYPTED FILE----- recipient: age1rk6fs67kly3h4zux5za429z3grjtvs8vcunav4sa28pxk738najsk8wgs6 encrypted_regex: ^(data|stringData)$ - lastmodified: "2026-06-08T19:51:48Z" - mac: ENC[AES256_GCM,data:TdLWRzAuNW82M2AgKPhxi+5US+IyxOJQkuv9qgc1TYQdxPUzig7LlYFnvfeYmJVyhgUloEEhOg6C/vgKpbQAcrn6flsjxDYU4VuCTaNNW/jzbLcoQn/o4zDrNS+T/cCrJed4hHb8tUEqt9PBOFv4sb/MnqlSYTOW4MAkLyhIXEs=,iv:ieum85WMNdxX+HMRdQ4PR7PSHHT7euQmTBjra3HrTb4=,tag:VkC19chAwec1rYwiSoT8nQ==,type:str] + lastmodified: "2026-06-11T21:23:41Z" + mac: ENC[AES256_GCM,data:YoZEVwgLiKOxak8XpwztabXwb7VqLm4Kik/Otv+6CHTYNHx67OYsAvsfhDk+31vwxl29eNWKCg+FtX4Y1NKPfZozpUenM0fuhdHmbMagnzCQYtaqH0kKvOzThlxtF+dLqEXGc408vUquiqcasyRgpW5xMpM4dycSyBApsFtYNNA=,iv:I+qLNTRkeXwVR1Qpd+VtNzRkdvaOzawz7KlN4ZSK+xM=,tag:ylGynT2M0u16+ezaV4QVAg==,type:str] version: 3.13.1 diff --git a/k8s/providers/hetzner/infrastructure/controllers/external-dns/cloudflare-api-token-secret.yaml b/k8s/providers/hetzner/infrastructure/controllers/external-dns/cloudflare-api-token-secret.yaml deleted file mode 100644 index 8169b3165..000000000 --- a/k8s/providers/hetzner/infrastructure/controllers/external-dns/cloudflare-api-token-secret.yaml +++ /dev/null @@ -1,8 +0,0 @@ -apiVersion: v1 -kind: Secret -metadata: - name: external-dns-cloudflare - namespace: external-dns -type: Opaque -stringData: - api-token: ${cloudflare_api_token} diff --git a/k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml b/k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml index e67b35204..e144db730 100644 --- a/k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml +++ b/k8s/providers/hetzner/infrastructure/controllers/kustomization.yaml @@ -13,7 +13,10 @@ resources: # so the policy lives here, not in the base — see cilium/ for the rationale. - cilium/ - coredns/ - - external-dns/ + # NB: external-dns moved to ../external-dns/ (the `infrastructure` layer): + # its Cloudflare token now arrives via an OpenBao ExternalSecret, and the + # ClusterSecretStore only exists once `infrastructure` reconciles — in this + # wait-gated layer the missing Secret would deadlock the dependency chain. - flux-instance/ - hcloud-ccm/ - hcloud-csi/ diff --git a/k8s/providers/hetzner/infrastructure/external-dns/cloudflare-api-token-external-secret.yaml b/k8s/providers/hetzner/infrastructure/external-dns/cloudflare-api-token-external-secret.yaml new file mode 100644 index 000000000..a521c4464 --- /dev/null +++ b/k8s/providers/hetzner/infrastructure/external-dns/cloudflare-api-token-external-secret.yaml @@ -0,0 +1,27 @@ +--- +# Cloudflare API token from OpenBao — same KV entry the cert-manager DNS01 +# solver reads (../cluster-issuers/cloudflare-api-token-external-secret.yaml). +# The token is user-fed (`bao kv put secret/infrastructure/dns/cloudflare +# api_token=`), NOT SOPS-seeded: external-dns is not bootstrap-critical +# (existing DNS records keep serving while it waits), which is why this dir +# lives in the `infrastructure` layer — in `infrastructure/controllers` the +# wait-gated layer would deadlock on this ExternalSecret before the +# ClusterSecretStore (an `infrastructure`-layer resource) exists. +apiVersion: external-secrets.io/v1 +kind: ExternalSecret +metadata: + name: external-dns-cloudflare + namespace: external-dns +spec: + refreshInterval: 1h + secretStoreRef: + name: openbao + kind: ClusterSecretStore + target: + name: external-dns-cloudflare + creationPolicy: Owner + data: + - secretKey: api-token + remoteRef: + key: infrastructure/dns/cloudflare + property: api_token diff --git a/k8s/providers/hetzner/infrastructure/controllers/external-dns/helm-release.yaml b/k8s/providers/hetzner/infrastructure/external-dns/helm-release.yaml similarity index 100% rename from k8s/providers/hetzner/infrastructure/controllers/external-dns/helm-release.yaml rename to k8s/providers/hetzner/infrastructure/external-dns/helm-release.yaml diff --git a/k8s/providers/hetzner/infrastructure/controllers/external-dns/helm-repository.yaml b/k8s/providers/hetzner/infrastructure/external-dns/helm-repository.yaml similarity index 100% rename from k8s/providers/hetzner/infrastructure/controllers/external-dns/helm-repository.yaml rename to k8s/providers/hetzner/infrastructure/external-dns/helm-repository.yaml diff --git a/k8s/providers/hetzner/infrastructure/controllers/external-dns/kustomization.yaml b/k8s/providers/hetzner/infrastructure/external-dns/kustomization.yaml similarity index 78% rename from k8s/providers/hetzner/infrastructure/controllers/external-dns/kustomization.yaml rename to k8s/providers/hetzner/infrastructure/external-dns/kustomization.yaml index 651404464..2322c52ef 100644 --- a/k8s/providers/hetzner/infrastructure/controllers/external-dns/kustomization.yaml +++ b/k8s/providers/hetzner/infrastructure/external-dns/kustomization.yaml @@ -3,7 +3,7 @@ apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - namespace.yaml - - cloudflare-api-token-secret.yaml + - cloudflare-api-token-external-secret.yaml - helm-release.yaml - helm-repository.yaml - networkpolicy.yaml diff --git a/k8s/providers/hetzner/infrastructure/controllers/external-dns/namespace.yaml b/k8s/providers/hetzner/infrastructure/external-dns/namespace.yaml similarity index 100% rename from k8s/providers/hetzner/infrastructure/controllers/external-dns/namespace.yaml rename to k8s/providers/hetzner/infrastructure/external-dns/namespace.yaml diff --git a/k8s/providers/hetzner/infrastructure/controllers/external-dns/networkpolicy.yaml b/k8s/providers/hetzner/infrastructure/external-dns/networkpolicy.yaml similarity index 100% rename from k8s/providers/hetzner/infrastructure/controllers/external-dns/networkpolicy.yaml rename to k8s/providers/hetzner/infrastructure/external-dns/networkpolicy.yaml diff --git a/k8s/providers/hetzner/infrastructure/kustomization.yaml b/k8s/providers/hetzner/infrastructure/kustomization.yaml index 06d5e18c2..88d1f21d8 100644 --- a/k8s/providers/hetzner/infrastructure/kustomization.yaml +++ b/k8s/providers/hetzner/infrastructure/kustomization.yaml @@ -4,6 +4,14 @@ kind: Kustomization resources: - ../../../bases/infrastructure/ - cluster-issuers/ + # external-dns lives in this `infrastructure` layer (not + # `infrastructure/controllers/`) because its Cloudflare token is a user-fed + # OpenBao secret materialised by an ExternalSecret, and the openbao + # ClusterSecretStore is itself an `infrastructure`-layer resource — in the + # wait-gated controllers layer the never-syncing Secret would deadlock the + # bootstrap dependency chain. Not bootstrap-critical: existing DNS records + # keep serving while it waits for the token. + - external-dns/ - vault-seed/ # Prod-only FinOps: a Kyverno ClusterPolicy that biases pod scheduling toward # the static baseline workers and away from the autoscaler nodes, so the