Summary
The DocumentDB controller pods shipped by the Helm chart (operator, sidecar-injector, wal-replica) were hardened in #382 to meet Pod Security Admission (PSA) restricted requirements. However, the runtime cluster pods that the operator creates via CNPG are not compliant, because the two sidecar containers we inject via the CNPG-I plugin (documentdb-gateway and otel-collector) ship with weak / missing securityContext fields.
On a namespace labeled pod-security.kubernetes.io/enforce=restricted, the Kubernetes API server will reject every DocumentDB cluster pod and the cluster will never come up. The operator log will show successful reconciliation of the Cluster CR; the failure surfaces only in CNPG's pod-creation events.
Reproduction
Install the chart into a namespace with pod-security.kubernetes.io/enforce=restricted, then kubectl apply a sample DocumentDB CR. Cluster pods will fail admission with a message similar to:
pods "documentdb-cluster-1" is forbidden: violates PodSecurity "restricted:v1.30":
allowPrivilegeEscalation != false (containers "documentdb-gateway", "otel-collector"),
unrestricted capabilities (containers "documentdb-gateway", "otel-collector" must set securityContext.capabilities.drop=["ALL"]),
seccompProfile (containers "otel-collector" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Audit (live cluster, documentdb-cluster-1 pod)
| Container |
Source |
runAsNonRoot |
seccompProfile |
allowPrivilegeEscalation |
capabilities.drop |
readOnlyRootFilesystem |
PSA restricted? |
postgres |
CNPG built-in |
✅ |
✅ |
✅ false |
✅ ALL |
✅ true |
✅ |
bootstrap-controller (init) |
CNPG built-in |
✅ |
✅ |
✅ false |
✅ ALL |
✅ true |
✅ |
documentdb-gateway |
DocumentDB sidecar-injector |
⚠️ inherited from pod |
⚠️ inherited from pod |
❌ missing |
❌ missing |
❌ missing |
❌ |
otel-collector |
DocumentDB sidecar-injector |
⚠️ inherited from pod |
❌ not inherited (no container-level value, sidecar has no SecurityContext at all) |
❌ missing |
❌ missing |
❌ missing |
❌ |
PSA restricted requires allowPrivilegeEscalation: false and capabilities.drop: [ALL] to be set per container — pod-level inheritance does not satisfy these checks.
Root cause
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go:
- Line 176–179: the gateway sidecar's
SecurityContext is set to only RunAsUser / RunAsGroup.
- Line 266+: the OTel collector sidecar is constructed with no
SecurityContext field at all.
Suggested fix (rough sketch)
Apply a hardened SecurityContext to both injected containers:
SecurityContext: &corev1.SecurityContext{
RunAsUser: pointer.Int64(1000),
RunAsGroup: pointer.Int64(1000),
RunAsNonRoot: pointer.Bool(true),
AllowPrivilegeEscalation: pointer.Bool(false),
Capabilities: &corev1.Capabilities{Drop: []corev1.Capability{"ALL"}},
ReadOnlyRootFilesystem: pointer.Bool(true),
SeccompProfile: &corev1.SeccompProfile{Type: corev1.SeccompProfileTypeRuntimeDefault},
},
Notes for the implementer:
- The OTel collector upstream image may need a writable scratch dir; if
readOnlyRootFilesystem: true breaks it, mount an emptyDir at the writable path rather than dropping the flag.
- Consider exposing user-overridable values in the
DocumentDB CR spec (spec.gatewaySecurityContext, spec.otelCollectorSecurityContext) so customers can adjust UID/GID for image variants without forking.
- Add a unit test in
lifecycle_test.go asserting both injected containers carry the required fields, so this can't regress silently.
Impact
🔴 GA blocker for any customer running on a Kubernetes platform that defaults namespaces to PSA restricted, including:
- AKS with Azure Policy "Kubernetes cluster pods should only use approved security profiles"
- GKE Autopilot
- OpenShift (which is even stricter, via SCC)
- Any cluster following CIS Benchmark recommendations
These customers can install the operator successfully (PR #382 covers the controller pods) but cannot create a working DocumentDB cluster.
Out of scope
References
Summary
The DocumentDB controller pods shipped by the Helm chart (operator, sidecar-injector, wal-replica) were hardened in #382 to meet Pod Security Admission (PSA)
restrictedrequirements. However, the runtime cluster pods that the operator creates via CNPG are not compliant, because the two sidecar containers we inject via the CNPG-I plugin (documentdb-gatewayandotel-collector) ship with weak / missingsecurityContextfields.On a namespace labeled
pod-security.kubernetes.io/enforce=restricted, the Kubernetes API server will reject every DocumentDB cluster pod and the cluster will never come up. The operator log will show successful reconciliation of theClusterCR; the failure surfaces only in CNPG's pod-creation events.Reproduction
Install the chart into a namespace with
pod-security.kubernetes.io/enforce=restricted, thenkubectl applya sampleDocumentDBCR. Cluster pods will fail admission with a message similar to:Audit (live cluster,
documentdb-cluster-1pod)runAsNonRootseccompProfileallowPrivilegeEscalationcapabilities.dropreadOnlyRootFilesystemrestricted?postgresbootstrap-controller(init)documentdb-gatewayotel-collectorPSA
restrictedrequiresallowPrivilegeEscalation: falseandcapabilities.drop: [ALL]to be set per container — pod-level inheritance does not satisfy these checks.Root cause
operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go:SecurityContextis set to onlyRunAsUser/RunAsGroup.SecurityContextfield at all.Suggested fix (rough sketch)
Apply a hardened
SecurityContextto both injected containers:Notes for the implementer:
readOnlyRootFilesystem: truebreaks it, mount anemptyDirat the writable path rather than dropping the flag.DocumentDBCR spec (spec.gatewaySecurityContext,spec.otelCollectorSecurityContext) so customers can adjust UID/GID for image variants without forking.lifecycle_test.goasserting both injected containers carry the required fields, so this can't regress silently.Impact
🔴 GA blocker for any customer running on a Kubernetes platform that defaults namespaces to PSA
restricted, including:These customers can install the operator successfully (PR #382 covers the controller pods) but cannot create a working DocumentDB cluster.
Out of scope
References