Skip to content

[GA blocker] Runtime cluster pods are not Pod Security Admission "restricted" compliant #387

@WentingWu666666

Description

@WentingWu666666

Summary

The DocumentDB controller pods shipped by the Helm chart (operator, sidecar-injector, wal-replica) were hardened in #382 to meet Pod Security Admission (PSA) restricted requirements. However, the runtime cluster pods that the operator creates via CNPG are not compliant, because the two sidecar containers we inject via the CNPG-I plugin (documentdb-gateway and otel-collector) ship with weak / missing securityContext fields.

On a namespace labeled pod-security.kubernetes.io/enforce=restricted, the Kubernetes API server will reject every DocumentDB cluster pod and the cluster will never come up. The operator log will show successful reconciliation of the Cluster CR; the failure surfaces only in CNPG's pod-creation events.

Reproduction

Install the chart into a namespace with pod-security.kubernetes.io/enforce=restricted, then kubectl apply a sample DocumentDB CR. Cluster pods will fail admission with a message similar to:

pods "documentdb-cluster-1" is forbidden: violates PodSecurity "restricted:v1.30":
allowPrivilegeEscalation != false (containers "documentdb-gateway", "otel-collector"),
unrestricted capabilities (containers "documentdb-gateway", "otel-collector" must set securityContext.capabilities.drop=["ALL"]),
seccompProfile (containers "otel-collector" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

Audit (live cluster, documentdb-cluster-1 pod)

Container Source runAsNonRoot seccompProfile allowPrivilegeEscalation capabilities.drop readOnlyRootFilesystem PSA restricted?
postgres CNPG built-in ✅ false ✅ ALL ✅ true
bootstrap-controller (init) CNPG built-in ✅ false ✅ ALL ✅ true
documentdb-gateway DocumentDB sidecar-injector ⚠️ inherited from pod ⚠️ inherited from pod ❌ missing ❌ missing ❌ missing
otel-collector DocumentDB sidecar-injector ⚠️ inherited from pod not inherited (no container-level value, sidecar has no SecurityContext at all) ❌ missing ❌ missing ❌ missing

PSA restricted requires allowPrivilegeEscalation: false and capabilities.drop: [ALL] to be set per container — pod-level inheritance does not satisfy these checks.

Root cause

operator/cnpg-plugins/sidecar-injector/internal/lifecycle/lifecycle.go:

  • Line 176–179: the gateway sidecar's SecurityContext is set to only RunAsUser / RunAsGroup.
  • Line 266+: the OTel collector sidecar is constructed with no SecurityContext field at all.

Suggested fix (rough sketch)

Apply a hardened SecurityContext to both injected containers:

SecurityContext: &corev1.SecurityContext{
    RunAsUser:                pointer.Int64(1000),
    RunAsGroup:               pointer.Int64(1000),
    RunAsNonRoot:             pointer.Bool(true),
    AllowPrivilegeEscalation: pointer.Bool(false),
    Capabilities:             &corev1.Capabilities{Drop: []corev1.Capability{"ALL"}},
    ReadOnlyRootFilesystem:   pointer.Bool(true),
    SeccompProfile:           &corev1.SeccompProfile{Type: corev1.SeccompProfileTypeRuntimeDefault},
},

Notes for the implementer:

  • The OTel collector upstream image may need a writable scratch dir; if readOnlyRootFilesystem: true breaks it, mount an emptyDir at the writable path rather than dropping the flag.
  • Consider exposing user-overridable values in the DocumentDB CR spec (spec.gatewaySecurityContext, spec.otelCollectorSecurityContext) so customers can adjust UID/GID for image variants without forking.
  • Add a unit test in lifecycle_test.go asserting both injected containers carry the required fields, so this can't regress silently.

Impact

🔴 GA blocker for any customer running on a Kubernetes platform that defaults namespaces to PSA restricted, including:

  • AKS with Azure Policy "Kubernetes cluster pods should only use approved security profiles"
  • GKE Autopilot
  • OpenShift (which is even stricter, via SCC)
  • Any cluster following CIS Benchmark recommendations

These customers can install the operator successfully (PR #382 covers the controller pods) but cannot create a working DocumentDB cluster.

Out of scope

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggoPull requests that update go code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions