devantler-tech · devantler · Jun 12, 2026 · Jun 10, 2026 · Jun 11, 2026 · Jun 11, 2026
@@ -2,10 +2,27 @@
 
 ## Overview
 
-OpenBao stores secrets in file-based storage. The Velero daily backup captures
-the openbao namespace PVCs and the `openbao-unseal` Secret (which contains the
-unseal key and root token). The `vault-config` Job auto-initializes OpenBao on
-fresh clusters and auto-unseals on restarts.
+OpenBao runs as a raft (Integrated Storage) cluster. Three artifacts make it
+recoverable:
+
+1. **Raft snapshots** — the `vault-snapshot` CronJob saves a
+   `bao operator raft snapshot` to the `vault-snapshots` PVC daily (newest 14
+   kept) and mirrors them to the S3 backup target under `openbao-snapshots/`
+   (Cloudflare R2 in prod, MinIO locally). The mirror exists because Velero's
+   file-system backup only captures volumes mounted by *running* pods —
+   nothing mounts this PVC outside the CronJob's brief run, so Velero alone
+   would never carry the snapshots off-cluster.
+2. **`openbao-unseal` Secret** — unseal key + root token, captured by Velero's
+   daily resource backup. A snapshot is only usable together with the keys
+   that were current when it was taken.
+3. **The `vault-config` Job** — auto-initializes OpenBao on fresh clusters,
+   auto-unseals on restarts, and **auto-restores**: when no pod reports an
+   initialized barrier but `openbao-unseal` still holds keys AND a snapshot
+   exists on the PVC, it temp-initializes, runs
+   `bao operator raft snapshot restore -force` with the newest snapshot, and
+   unseals with the stored key — no operator action. Only when no snapshot is
+   available does it abort and demand explicit data-loss acknowledgement
+   (the #1982 guard, unchanged).
 
 ## Recovery Scenarios
 
@@ -21,50 +38,62 @@ No manual action needed. Verify:
 kubectl exec -n openbao openbao-0 -- bao status
 ```
 
-### Scenario 2: PVC data corruption
+### Scenario 2: Raft data corruption or loss (Secret + snapshots intact)
 
-**Symptom**: OpenBao fails to start, storage errors in logs.
+**Symptom**: OpenBao fails to start with storage errors, or all pods report
+`initialized: false` while `openbao-unseal` still exists (the 2026-06-10
+incident shape).
 
-**Resolution**:
+**Resolution** — automated. Reset the data volumes and let the `vault-config`
+Job restore the newest snapshot:
 
 1. Scale down the StatefulSet:
    ```bash
    kubectl scale statefulset -n openbao openbao --replicas=0
    ```
-2. Delete the corrupted PVC:
+2. Delete the corrupted data PVCs (NOT `vault-snapshots`, and do NOT delete
+   the `openbao-unseal` Secret — both are the restore inputs):
    ```bash
-   kubectl delete pvc -n openbao data-openbao-0
-   ```
-3. Scale back up:
-   ```bash
-   kubectl scale statefulset -n openbao openbao --replicas=1
-   ```
-4. Delete the stale `openbao-unseal` Secret (old keys are for the old storage):
-   ```bash
-   kubectl delete secret -n openbao openbao-unseal
+   kubectl delete pvc -n openbao data-openbao-0 data-openbao-1 data-openbao-2
    ```
-5. Trigger Flux reconciliation (`ksail workload reconcile`) — the `vault-config`
-   Job re-runs, auto-initializes with fresh keys, and configures policies/roles.
-6. PushSecrets re-seed the vault from SOPS variables on next reconciliation.
-
-### Scenario 3: Full cluster rebuild (Velero restore)
-
-**Symptom**: Entire cluster is lost (DR scenario), Velero backup available.
-
-**Resolution**:
-
-1. `ksail cluster create` — provisions infrastructure
-2. Deploy Velero and restore from backup — this restores:
-   - OpenBao PVC (vault data)
-   - `openbao-unseal` Secret (unseal key + root token)
-3. Flux deploys `infrastructure-controllers` → OpenBao starts
-4. The `postStart` hook reads the restored `openbao-unseal` Secret → auto-unseals
-5. `vault-config` Job runs → detects vault is already initialized → skips init →
-   converges policies/roles
-6. ExternalSecrets resume syncing from the restored vault data
-
-**Key point**: Velero backs up both the PVC (vault data) and the `openbao-unseal`
-Secret (unseal credentials). Both are needed for a complete restore.
+3. Trigger Flux reconciliation (`ksail workload reconcile`) — the StatefulSet
+   scales back up with empty volumes, and the `vault-config` Job detects
+   uninitialized-pods + surviving-keys + available-snapshot and restores
+   automatically (worst-case RPO: 24 h, the snapshot cadence).
+4. ExternalSecrets resume syncing; PushSecrets top up anything newer than the
+   snapshot on their next refresh.
+
+Only if **no snapshot exists** (PVC also lost and the R2 mirror is empty) does
+the Job abort with the data-loss guard; acknowledge the loss explicitly with
+`kubectl delete secret openbao-unseal -n openbao` and the next run
+re-initializes from scratch (Scenario 4 then re-seeds the KV).
+
+### Scenario 3: Full cluster rebuild (backups available)
+
+**Symptom**: Entire cluster is lost (DR scenario); the R2 snapshot mirror
+and/or Velero backups are available.
+
+**Sequencing caveat**: on a rebuilt cluster Flux stands OpenBao up *before*
+any restore can run, so the `vault-config` Job auto-initializes a **fresh**
+vault first (no `openbao-unseal` exists yet → the guard does not trigger).
+Recovering the old data means deliberately resetting that fresh vault into
+the Scenario 2 shape:
+
+1. `ksail cluster create` + `workload push`/`reconcile` — the platform
+   converges with a fresh, empty vault (PushSecrets re-seed SOPS-sourced
+   values, so the cluster is functional but generated secrets are new).
+2. Restore the old `openbao-unseal` Secret (from the Velero backup) over the
+   fresh one, and copy the newest snapshot from the R2 `openbao-snapshots/`
+   mirror onto the `vault-snapshots` PVC.
+3. Scale OpenBao to 0, delete the fresh `data-openbao-*` PVCs, reconcile —
+   the `vault-config` Job now hits the automated restore path (Scenario 2)
+   and brings back the pre-incident vault.
+4. ExternalSecrets resume syncing from the restored data; consumers pick up
+   the old (matching) credentials.
+
+**Key point**: a snapshot and the `openbao-unseal` Secret must come from the
+same generation (same backup day) — keys from one era cannot unseal a
+snapshot from another.
 
 ### Scenario 4: Full cluster rebuild (no Velero backup)
 

@@ -5,8 +5,21 @@
 # openbao-unseal Secret: restore via
 #   bao operator raft snapshot restore <file>
 # then unseal with the key that was current when the snapshot was taken
-# (docs/dr/openbao-raft-ha-migration.md).
+# (docs/dr/openbao-raft-ha-migration.md). The vault-config Job performs
+# this restore automatically when it finds an uninitialized cluster, a
+# surviving openbao-unseal Secret AND a snapshot on the PVC (docs/dr/
+# openbao.md).
 # Authenticates via Kubernetes auth (vault-snapshot role).
+#
+# Pod layout: the snapshot runs as an initContainer and the off-cluster
+# mirror as the main container, so the upload only ever runs after a NEW
+# snapshot succeeded (containers in one pod run in parallel; init -> main
+# gives strict ordering). The mirror copies the PVC's snapshots to the
+# S3 backup target (Cloudflare R2 in prod, in-cluster MinIO locally) —
+# Velero's file-system backup CANNOT carry this PVC off-cluster, because
+# FSB only backs up volumes mounted by RUNNING pods and nothing mounts
+# this PVC outside the CronJob's own brief run. Without the mirror, a
+# full-cluster loss would also lose every raft snapshot.
 apiVersion: batch/v1
 kind: CronJob
 metadata:
@@ -52,7 +65,16 @@ spec:
             - name: snapshots
               persistentVolumeClaim:
                 claimName: vault-snapshots
-          containers:
+            # R2 credentials synced by ESO (external-secret.yaml). Optional:
+            # on a fresh cluster the mirror container skips with a log line
+            # until the sync lands; the local snapshot is unaffected.
+            - name: r2-credentials
+              secret:
+                secretName: vault-snapshot-r2
+                optional: true
+            - name: mc-config
+              emptyDir: {}
+          initContainers:
             - name: snapshot
               image: quay.io/openbao/openbao:2.5.3@sha256:fdc6da21ca6963560c32336fd7feb9cf2d5e52668f1a1647205a4b41171f0806
               securityContext:
@@ -107,3 +129,52 @@ spec:
                     rm -f "$OLD"
                   done
                   echo "Snapshot complete."
+          containers:
+            - name: mirror
+              image: quay.io/minio/mc:RELEASE.2025-04-08T15-39-49Z@sha256:7e3efb09c22c0882fbf341b9d99f61f94ae6c4c20a06f2f1a2b20ea8993d8952
+              securityContext:
+                runAsNonRoot: true
+                allowPrivilegeEscalation: false
+                capabilities:
+                  drop: ["ALL"]
+                readOnlyRootFilesystem: true
+              env:
+                - name: MC_CONFIG_DIR
+                  value: /tmp/.mc
+              resources:
+                requests:
+                  cpu: 10m
+                  memory: 32Mi
+                limits:
+                  memory: 128Mi
+              volumeMounts:
+                - name: snapshots
+                  mountPath: /snapshots
+                  readOnly: true
+                - name: r2-credentials
+                  mountPath: /r2
+                  readOnly: true
+                - name: mc-config
+                  mountPath: /tmp/.mc
+              command:
+                - /bin/sh
+                - -ec
+                - |
+                  if [ ! -f /r2/access_key_id ]; then
+                    echo "vault-snapshot-r2 Secret not synced yet — skipping the"
+                    echo "off-cluster mirror. The local PVC snapshot above still ran."
+                    exit 0
+                  fi
+                  mc alias set backup "${r2_endpoint}" \
+                    "$(cat /r2/access_key_id)" "$(cat /r2/secret_access_key)"
+                  # Copy-only mirror: NO --remove, so an empty/recreated PVC can
+                  # never wipe the off-cluster history. Remote retention is
+                  # pruned by age instead, and only after a successful upload in
+                  # the same run (this container only starts when the snapshot
+                  # initContainer succeeded), so the newest snapshot always
+                  # survives the prune.
+                  mc mirror --overwrite /snapshots/ "backup/${r2_bucket}/openbao-snapshots/"
+                  mc rm --recursive --force --older-than 14d \
+                    "backup/${r2_bucket}/openbao-snapshots/" || true
+                  echo "Mirrored snapshots:"
+                  mc ls "backup/${r2_bucket}/openbao-snapshots/"
@@ -0,0 +1,29 @@
+---
+# R2 credentials for the vault-snapshot CronJob's off-cluster mirror
+# container — the same infrastructure/backup/r2 KV entry Velero and CNPG
+# use, synced into a plain two-key Secret for `mc`. Deliberately optional
+# on the consumer side (the CronJob mounts it `optional: true`): on a
+# fresh cluster the mirror is skipped with a log line until ESO has
+# synced, while the local PVC snapshot still runs.
+apiVersion: external-secrets.io/v1
+kind: ExternalSecret
+metadata:
+  name: vault-snapshot-r2
+  namespace: openbao
+spec:
+  refreshInterval: 1h
+  secretStoreRef:
+    name: openbao
+    kind: ClusterSecretStore
+  target:
+    name: vault-snapshot-r2
+    creationPolicy: Owner
+  data:
+    - secretKey: access_key_id
+      remoteRef:
+        key: infrastructure/backup/r2
+        property: access_key_id
+    - secretKey: secret_access_key
+      remoteRef:
+        key: infrastructure/backup/r2
+        property: secret_access_key
@@ -14,6 +14,12 @@
 # Same retry pattern as the vault-config Job: the script fails until
 # vault-config has created the vault-snapshot auth role and unsealed the
 # vault, and OnFailure + backoffLimit keep retrying until then.
+#
+# Pod layout mirrors cronjob.yaml: snapshot as initContainer, off-cluster
+# mirror as the main container — the baseline snapshot reaches the S3
+# backup target immediately instead of waiting for the nightly mirror,
+# which matters because the deploy-time baseline is often the ONLY
+# snapshot during a change's first 24 hours.
 apiVersion: batch/v1
 kind: Job
 metadata:
@@ -47,7 +53,16 @@ spec:
         - name: snapshots
           persistentVolumeClaim:
             claimName: vault-snapshots
-      containers:
+        # R2 credentials synced by ESO (external-secret.yaml). Optional: on a
+        # fresh cluster the mirror container skips with a log line until the
+        # sync lands; the local snapshot is unaffected.
+        - name: r2-credentials
+          secret:
+            secretName: vault-snapshot-r2
+            optional: true
+        - name: mc-config
+          emptyDir: {}
+      initContainers:
         - name: snapshot
           image: quay.io/openbao/openbao:2.5.3@sha256:fdc6da21ca6963560c32336fd7feb9cf2d5e52668f1a1647205a4b41171f0806
           securityContext:
@@ -100,3 +115,50 @@ spec:
               bao operator raft snapshot save "$SNAP"
               ls -l "$SNAP"
               echo "Initial snapshot complete."
+      containers:
+        - name: mirror
+          image: quay.io/minio/mc:RELEASE.2025-04-08T15-39-49Z@sha256:7e3efb09c22c0882fbf341b9d99f61f94ae6c4c20a06f2f1a2b20ea8993d8952
+          securityContext:
+            runAsNonRoot: true
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            readOnlyRootFilesystem: true
+          env:
+            - name: MC_CONFIG_DIR
+              value: /tmp/.mc
+          resources:
+            requests:
+              cpu: 10m
+              memory: 32Mi
+            limits:
+              memory: 128Mi
+          volumeMounts:
+            - name: snapshots
+              mountPath: /snapshots
+              readOnly: true
+            - name: r2-credentials
+              mountPath: /r2
+              readOnly: true
+            - name: mc-config
+              mountPath: /tmp/.mc
+          command:
+            - /bin/sh
+            - -ec
+            - |
+              if [ ! -f /r2/access_key_id ]; then
+                echo "vault-snapshot-r2 Secret not synced yet — skipping the"
+                echo "off-cluster mirror. The local PVC snapshot above still ran."
+                exit 0
+              fi
+              mc alias set backup "${r2_endpoint}" \
+                "$(cat /r2/access_key_id)" "$(cat /r2/secret_access_key)"
+              # Copy-only mirror (no --remove) + age-based prune, same
+              # rationale as cronjob.yaml: an empty/recreated PVC can never
+              # wipe the off-cluster history, and the prune only runs after a
+              # successful upload in the same pod.
+              mc mirror --overwrite /snapshots/ "backup/${r2_bucket}/openbao-snapshots/"
+              mc rm --recursive --force --older-than 14d \
+                "backup/${r2_bucket}/openbao-snapshots/" || true
+              echo "Mirrored snapshots:"
+              mc ls "backup/${r2_bucket}/openbao-snapshots/"
@@ -4,6 +4,7 @@ kind: Kustomization
 resources:
   - serviceaccount.yaml
   - pvc.yaml
+  - external-secret.yaml
   - init-job.yaml
   - cronjob.yaml
   - networkpolicy.yaml
@@ -20,6 +20,21 @@ spec:
     # Kube API for ServiceAccount token authentication
     - toEntities:
         - kube-apiserver
+    # S3-compatible snapshot mirror target (Cloudflare R2 in prod)
+    - toEntities:
+        - world
+      toPorts:
+        - ports:
+            - port: "443"
+              protocol: TCP
+    # In-cluster MinIO mirror target for local/CI
+    - toEndpoints:
+        - matchLabels:
+            k8s:io.kubernetes.pod.namespace: minio
+      toPorts:
+        - ports:
+            - port: "9000"
+              protocol: TCP
     # DNS resolution
     - toEndpoints:
         - matchLabels:

@@ -1,8 +1,12 @@
 # Dedicated volume for OpenBao raft snapshots (written by the
 # vault-snapshot CronJob, newest 14 retained). Uses the cluster's default
 # StorageClass. Snapshots on this PVC are the first-line restore source
-# after OpenBao data loss; Velero's daily namespace backup carries them
-# off-cluster alongside the openbao-unseal Secret they pair with.
+# after OpenBao data loss: the vault-config Job restores from the newest
+# one automatically when it finds an uninitialized cluster alongside a
+# surviving openbao-unseal Secret. Off-cluster durability comes from the
+# CronJob's mirror container (S3/R2), NOT from Velero — Velero's
+# file-system backup only captures volumes mounted by running pods, and
+# nothing mounts this PVC outside the CronJob's brief daily run.
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata: