Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 68 additions & 39 deletions docs/dr/openbao.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,27 @@

## Overview

OpenBao stores secrets in file-based storage. The Velero daily backup captures
the openbao namespace PVCs and the `openbao-unseal` Secret (which contains the
unseal key and root token). The `vault-config` Job auto-initializes OpenBao on
fresh clusters and auto-unseals on restarts.
OpenBao runs as a raft (Integrated Storage) cluster. Three artifacts make it
recoverable:

1. **Raft snapshots** β€” the `vault-snapshot` CronJob saves a
`bao operator raft snapshot` to the `vault-snapshots` PVC daily (newest 14
kept) and mirrors them to the S3 backup target under `openbao-snapshots/`
(Cloudflare R2 in prod, MinIO locally). The mirror exists because Velero's
file-system backup only captures volumes mounted by *running* pods β€”
nothing mounts this PVC outside the CronJob's brief run, so Velero alone
would never carry the snapshots off-cluster.
2. **`openbao-unseal` Secret** β€” unseal key + root token, captured by Velero's
daily resource backup. A snapshot is only usable together with the keys
that were current when it was taken.
3. **The `vault-config` Job** β€” auto-initializes OpenBao on fresh clusters,
auto-unseals on restarts, and **auto-restores**: when no pod reports an
initialized barrier but `openbao-unseal` still holds keys AND a snapshot
exists on the PVC, it temp-initializes, runs
`bao operator raft snapshot restore -force` with the newest snapshot, and
unseals with the stored key β€” no operator action. Only when no snapshot is
available does it abort and demand explicit data-loss acknowledgement
(the #1982 guard, unchanged).

## Recovery Scenarios

Expand All @@ -21,50 +38,62 @@ No manual action needed. Verify:
kubectl exec -n openbao openbao-0 -- bao status
```

### Scenario 2: PVC data corruption
### Scenario 2: Raft data corruption or loss (Secret + snapshots intact)

**Symptom**: OpenBao fails to start, storage errors in logs.
**Symptom**: OpenBao fails to start with storage errors, or all pods report
`initialized: false` while `openbao-unseal` still exists (the 2026-06-10
incident shape).

**Resolution**:
**Resolution** β€” automated. Reset the data volumes and let the `vault-config`
Job restore the newest snapshot:

1. Scale down the StatefulSet:
```bash
kubectl scale statefulset -n openbao openbao --replicas=0
```
2. Delete the corrupted PVC:
2. Delete the corrupted data PVCs (NOT `vault-snapshots`, and do NOT delete
the `openbao-unseal` Secret β€” both are the restore inputs):
```bash
kubectl delete pvc -n openbao data-openbao-0
```
3. Scale back up:
```bash
kubectl scale statefulset -n openbao openbao --replicas=1
```
4. Delete the stale `openbao-unseal` Secret (old keys are for the old storage):
```bash
kubectl delete secret -n openbao openbao-unseal
kubectl delete pvc -n openbao data-openbao-0 data-openbao-1 data-openbao-2
```
5. Trigger Flux reconciliation (`ksail workload reconcile`) β€” the `vault-config`
Job re-runs, auto-initializes with fresh keys, and configures policies/roles.
6. PushSecrets re-seed the vault from SOPS variables on next reconciliation.

### Scenario 3: Full cluster rebuild (Velero restore)

**Symptom**: Entire cluster is lost (DR scenario), Velero backup available.

**Resolution**:

1. `ksail cluster create` β€” provisions infrastructure
2. Deploy Velero and restore from backup β€” this restores:
- OpenBao PVC (vault data)
- `openbao-unseal` Secret (unseal key + root token)
3. Flux deploys `infrastructure-controllers` β†’ OpenBao starts
4. The `postStart` hook reads the restored `openbao-unseal` Secret β†’ auto-unseals
5. `vault-config` Job runs β†’ detects vault is already initialized β†’ skips init β†’
converges policies/roles
6. ExternalSecrets resume syncing from the restored vault data

**Key point**: Velero backs up both the PVC (vault data) and the `openbao-unseal`
Secret (unseal credentials). Both are needed for a complete restore.
3. Trigger Flux reconciliation (`ksail workload reconcile`) β€” the StatefulSet
scales back up with empty volumes, and the `vault-config` Job detects
uninitialized-pods + surviving-keys + available-snapshot and restores
automatically (worst-case RPO: 24 h, the snapshot cadence).
4. ExternalSecrets resume syncing; PushSecrets top up anything newer than the
snapshot on their next refresh.

Only if **no snapshot exists** (PVC also lost and the R2 mirror is empty) does
the Job abort with the data-loss guard; acknowledge the loss explicitly with
`kubectl delete secret openbao-unseal -n openbao` and the next run
re-initializes from scratch (Scenario 4 then re-seeds the KV).

### Scenario 3: Full cluster rebuild (backups available)

**Symptom**: Entire cluster is lost (DR scenario); the R2 snapshot mirror
and/or Velero backups are available.

**Sequencing caveat**: on a rebuilt cluster Flux stands OpenBao up *before*
any restore can run, so the `vault-config` Job auto-initializes a **fresh**
vault first (no `openbao-unseal` exists yet β†’ the guard does not trigger).
Recovering the old data means deliberately resetting that fresh vault into
the Scenario 2 shape:

1. `ksail cluster create` + `workload push`/`reconcile` β€” the platform
converges with a fresh, empty vault (PushSecrets re-seed SOPS-sourced
values, so the cluster is functional but generated secrets are new).
2. Restore the old `openbao-unseal` Secret (from the Velero backup) over the
fresh one, and copy the newest snapshot from the R2 `openbao-snapshots/`
mirror onto the `vault-snapshots` PVC.
3. Scale OpenBao to 0, delete the fresh `data-openbao-*` PVCs, reconcile β€”
the `vault-config` Job now hits the automated restore path (Scenario 2)
and brings back the pre-incident vault.
4. ExternalSecrets resume syncing from the restored data; consumers pick up
the old (matching) credentials.

**Key point**: a snapshot and the `openbao-unseal` Secret must come from the
same generation (same backup day) β€” keys from one era cannot unseal a
snapshot from another.

### Scenario 4: Full cluster rebuild (no Velero backup)

Expand Down
75 changes: 73 additions & 2 deletions k8s/bases/infrastructure/vault-backup/cronjob.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,21 @@
# openbao-unseal Secret: restore via
# bao operator raft snapshot restore <file>
# then unseal with the key that was current when the snapshot was taken
# (docs/dr/openbao-raft-ha-migration.md).
# (docs/dr/openbao-raft-ha-migration.md). The vault-config Job performs
# this restore automatically when it finds an uninitialized cluster, a
# surviving openbao-unseal Secret AND a snapshot on the PVC (docs/dr/
# openbao.md).
# Authenticates via Kubernetes auth (vault-snapshot role).
#
# Pod layout: the snapshot runs as an initContainer and the off-cluster
# mirror as the main container, so the upload only ever runs after a NEW
# snapshot succeeded (containers in one pod run in parallel; init -> main
# gives strict ordering). The mirror copies the PVC's snapshots to the
# S3 backup target (Cloudflare R2 in prod, in-cluster MinIO locally) β€”
# Velero's file-system backup CANNOT carry this PVC off-cluster, because
# FSB only backs up volumes mounted by RUNNING pods and nothing mounts
# this PVC outside the CronJob's own brief run. Without the mirror, a
# full-cluster loss would also lose every raft snapshot.
apiVersion: batch/v1
kind: CronJob
metadata:
Expand Down Expand Up @@ -52,7 +65,16 @@ spec:
- name: snapshots
persistentVolumeClaim:
claimName: vault-snapshots
containers:
# R2 credentials synced by ESO (external-secret.yaml). Optional:
# on a fresh cluster the mirror container skips with a log line
# until the sync lands; the local snapshot is unaffected.
- name: r2-credentials
secret:
secretName: vault-snapshot-r2
optional: true
- name: mc-config
emptyDir: {}
initContainers:
- name: snapshot
image: quay.io/openbao/openbao:2.5.3@sha256:fdc6da21ca6963560c32336fd7feb9cf2d5e52668f1a1647205a4b41171f0806
securityContext:
Expand Down Expand Up @@ -107,3 +129,52 @@ spec:
rm -f "$OLD"
done
echo "Snapshot complete."
containers:
- name: mirror
image: quay.io/minio/mc:RELEASE.2025-04-08T15-39-49Z@sha256:7e3efb09c22c0882fbf341b9d99f61f94ae6c4c20a06f2f1a2b20ea8993d8952
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
readOnlyRootFilesystem: true
env:
- name: MC_CONFIG_DIR
value: /tmp/.mc
resources:
requests:
cpu: 10m
memory: 32Mi
limits:
memory: 128Mi
volumeMounts:
- name: snapshots
mountPath: /snapshots
readOnly: true
- name: r2-credentials
mountPath: /r2
readOnly: true
- name: mc-config
mountPath: /tmp/.mc
command:
- /bin/sh
- -ec
- |
if [ ! -f /r2/access_key_id ]; then
echo "vault-snapshot-r2 Secret not synced yet β€” skipping the"
echo "off-cluster mirror. The local PVC snapshot above still ran."
exit 0
fi
mc alias set backup "${r2_endpoint}" \
"$(cat /r2/access_key_id)" "$(cat /r2/secret_access_key)"
# Copy-only mirror: NO --remove, so an empty/recreated PVC can
# never wipe the off-cluster history. Remote retention is
# pruned by age instead, and only after a successful upload in
# the same run (this container only starts when the snapshot
# initContainer succeeded), so the newest snapshot always
# survives the prune.
mc mirror --overwrite /snapshots/ "backup/${r2_bucket}/openbao-snapshots/"
mc rm --recursive --force --older-than 14d \
"backup/${r2_bucket}/openbao-snapshots/" || true
echo "Mirrored snapshots:"
mc ls "backup/${r2_bucket}/openbao-snapshots/"
29 changes: 29 additions & 0 deletions k8s/bases/infrastructure/vault-backup/external-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
# R2 credentials for the vault-snapshot CronJob's off-cluster mirror
# container β€” the same infrastructure/backup/r2 KV entry Velero and CNPG
# use, synced into a plain two-key Secret for `mc`. Deliberately optional
# on the consumer side (the CronJob mounts it `optional: true`): on a
# fresh cluster the mirror is skipped with a log line until ESO has
# synced, while the local PVC snapshot still runs.
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: vault-snapshot-r2
namespace: openbao
spec:
refreshInterval: 1h
secretStoreRef:
name: openbao
kind: ClusterSecretStore
target:
name: vault-snapshot-r2
creationPolicy: Owner
data:
- secretKey: access_key_id
remoteRef:
key: infrastructure/backup/r2
property: access_key_id
- secretKey: secret_access_key
remoteRef:
key: infrastructure/backup/r2
property: secret_access_key
64 changes: 63 additions & 1 deletion k8s/bases/infrastructure/vault-backup/init-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@
# Same retry pattern as the vault-config Job: the script fails until
# vault-config has created the vault-snapshot auth role and unsealed the
# vault, and OnFailure + backoffLimit keep retrying until then.
#
# Pod layout mirrors cronjob.yaml: snapshot as initContainer, off-cluster
# mirror as the main container β€” the baseline snapshot reaches the S3
# backup target immediately instead of waiting for the nightly mirror,
# which matters because the deploy-time baseline is often the ONLY
# snapshot during a change's first 24 hours.
apiVersion: batch/v1
kind: Job
metadata:
Expand Down Expand Up @@ -47,7 +53,16 @@ spec:
- name: snapshots
persistentVolumeClaim:
claimName: vault-snapshots
containers:
# R2 credentials synced by ESO (external-secret.yaml). Optional: on a
# fresh cluster the mirror container skips with a log line until the
# sync lands; the local snapshot is unaffected.
- name: r2-credentials
secret:
secretName: vault-snapshot-r2
optional: true
- name: mc-config
emptyDir: {}
initContainers:
- name: snapshot
image: quay.io/openbao/openbao:2.5.3@sha256:fdc6da21ca6963560c32336fd7feb9cf2d5e52668f1a1647205a4b41171f0806
securityContext:
Expand Down Expand Up @@ -100,3 +115,50 @@ spec:
bao operator raft snapshot save "$SNAP"
ls -l "$SNAP"
echo "Initial snapshot complete."
containers:
- name: mirror
image: quay.io/minio/mc:RELEASE.2025-04-08T15-39-49Z@sha256:7e3efb09c22c0882fbf341b9d99f61f94ae6c4c20a06f2f1a2b20ea8993d8952
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
readOnlyRootFilesystem: true
env:
- name: MC_CONFIG_DIR
value: /tmp/.mc
resources:
requests:
cpu: 10m
memory: 32Mi
limits:
memory: 128Mi
volumeMounts:
- name: snapshots
mountPath: /snapshots
readOnly: true
- name: r2-credentials
mountPath: /r2
readOnly: true
- name: mc-config
mountPath: /tmp/.mc
command:
- /bin/sh
- -ec
- |
if [ ! -f /r2/access_key_id ]; then
echo "vault-snapshot-r2 Secret not synced yet β€” skipping the"
echo "off-cluster mirror. The local PVC snapshot above still ran."
exit 0
fi
mc alias set backup "${r2_endpoint}" \
"$(cat /r2/access_key_id)" "$(cat /r2/secret_access_key)"
# Copy-only mirror (no --remove) + age-based prune, same
# rationale as cronjob.yaml: an empty/recreated PVC can never
# wipe the off-cluster history, and the prune only runs after a
# successful upload in the same pod.
mc mirror --overwrite /snapshots/ "backup/${r2_bucket}/openbao-snapshots/"
mc rm --recursive --force --older-than 14d \
"backup/${r2_bucket}/openbao-snapshots/" || true
echo "Mirrored snapshots:"
mc ls "backup/${r2_bucket}/openbao-snapshots/"
1 change: 1 addition & 0 deletions k8s/bases/infrastructure/vault-backup/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ kind: Kustomization
resources:
- serviceaccount.yaml
- pvc.yaml
- external-secret.yaml
- init-job.yaml
- cronjob.yaml
- networkpolicy.yaml
15 changes: 15 additions & 0 deletions k8s/bases/infrastructure/vault-backup/networkpolicy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,21 @@ spec:
# Kube API for ServiceAccount token authentication
- toEntities:
- kube-apiserver
# S3-compatible snapshot mirror target (Cloudflare R2 in prod)
- toEntities:
- world
toPorts:
- ports:
- port: "443"
protocol: TCP
# In-cluster MinIO mirror target for local/CI
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: minio
toPorts:
- ports:
- port: "9000"
protocol: TCP
# DNS resolution
- toEndpoints:
- matchLabels:
Expand Down
8 changes: 6 additions & 2 deletions k8s/bases/infrastructure/vault-backup/pvc.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
# Dedicated volume for OpenBao raft snapshots (written by the
# vault-snapshot CronJob, newest 14 retained). Uses the cluster's default
# StorageClass. Snapshots on this PVC are the first-line restore source
# after OpenBao data loss; Velero's daily namespace backup carries them
# off-cluster alongside the openbao-unseal Secret they pair with.
# after OpenBao data loss: the vault-config Job restores from the newest
# one automatically when it finds an uninitialized cluster alongside a
# surviving openbao-unseal Secret. Off-cluster durability comes from the
# CronJob's mirror container (S3/R2), NOT from Velero β€” Velero's
# file-system backup only captures volumes mounted by running pods, and
# nothing mounts this PVC outside the CronJob's brief daily run.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
Expand Down
Loading
Loading