Description
In a cgroup v2 environment, runc currently allows mounting cgroupfs inside a container even when the container is not running with a private cgroup namespace (cgroupns disabled).
This can lead to unintended side effects, as global mount options (e.g., nsdelegate) on the host’s cgroup filesystem may be modified or overridden by the container’s mount operation.
Problem
When cgroupns is not enabled, the container shares the host’s cgroup namespace. In this scenario:
• Mounting cgroupfs inside the container directly operates on the host’s cgroup hierarchy
• Mount options applied during the mount (e.g., nsdelegate) may:
• Override existing global mount options
• Introduce inconsistent behavior across the system
• Break assumptions about cgroup isolation
This effectively allows a container to mutate global kernel state without proper isolation, which is unsafe and unexpected.
Expected Behavior
runc should ensure safe behavior when handling cgroupfs mounts under cgroup v2. Specifically, when:
• The system is using cgroup v2, and
• The container does not have a private cgroup namespace (cgroupns disabled)
Then one of the following should be enforced:
1. Reject the mount entirely, or
2. Ensure the mount options are consistent with the host’s existing cgroupfs mount (i.e., do not override global mount options)
Proposed Solution
Adopt one of the following strategies when mounting cgroupfs under cgroup v2:
Option 1: Strict validation (preferred for safety)
• Add a validation check in runc
• If cgroupns is disabled:
• Disallow mounting cgroupfs inside the container
• Return a clear error message indicating that cgroup namespace isolation is required
Option 2: Inherit host mount configuration
• Ensure that any cgroupfs mount inside the container:
• Reuses the host’s existing mount options
• Does not override global flags such as nsdelegate
Related Work
A similar issue has been identified and addressed in LXCFS:
• Linux Containers (LXC) project PR: lxc/lxcfs#703
Impact
- Prevents containers from modifying global cgroup mount behavior
- Improves isolation guarantees under cgroup v2
Description
In a cgroup v2 environment, runc currently allows mounting cgroupfs inside a container even when the container is not running with a private cgroup namespace (cgroupns disabled).
This can lead to unintended side effects, as global mount options (e.g., nsdelegate) on the host’s cgroup filesystem may be modified or overridden by the container’s mount operation.
Problem
When cgroupns is not enabled, the container shares the host’s cgroup namespace. In this scenario:
• Mounting cgroupfs inside the container directly operates on the host’s cgroup hierarchy
• Mount options applied during the mount (e.g., nsdelegate) may:
• Override existing global mount options
• Introduce inconsistent behavior across the system
• Break assumptions about cgroup isolation
This effectively allows a container to mutate global kernel state without proper isolation, which is unsafe and unexpected.
Expected Behavior
runc should ensure safe behavior when handling cgroupfs mounts under cgroup v2. Specifically, when:
• The system is using cgroup v2, and
• The container does not have a private cgroup namespace (cgroupns disabled)
Then one of the following should be enforced:
1. Reject the mount entirely, or
2. Ensure the mount options are consistent with the host’s existing cgroupfs mount (i.e., do not override global mount options)
Proposed Solution
Adopt one of the following strategies when mounting cgroupfs under cgroup v2:
Option 1: Strict validation (preferred for safety)
• Add a validation check in runc
• If cgroupns is disabled:
• Disallow mounting cgroupfs inside the container
• Return a clear error message indicating that cgroup namespace isolation is required
Option 2: Inherit host mount configuration
• Ensure that any cgroupfs mount inside the container:
• Reuses the host’s existing mount options
• Does not override global flags such as nsdelegate
Related Work
A similar issue has been identified and addressed in LXCFS:
• Linux Containers (LXC) project PR: lxc/lxcfs#703
Impact