fix(runtimes): set MPI SSH auth secret volume default mode to 0640#3368
fix(runtimes): set MPI SSH auth secret volume default mode to 0640#3368harxhist wants to merge 3 commits into
Conversation
Signed-off-by: Harsh <harxhist@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
Sridhar1030
left a comment
There was a problem hiding this comment.
Thanks for the PR. The change to set explicit defaultMode 0640 on the MPI SSH auth Secret volume matches the issue and is a sensible hardening step. Test updates look consistent with the behavior change.
just a small non-blocking nit
the global IgnoreFields(..., "DefaultMode") in the framework component-builder test could mask defaultMode on other Secret volumes someday; spelling out 0640 on the expected Secret volumes and dropping the ignore would make the test stricter.
|
@Sridhar1030: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Harsh <harxhist@gmail.com>
|
@Sridhar1030: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
jiayuzhao05
left a comment
There was a problem hiding this comment.
I think there is one important behavioral risk to verify before merge.
DefaultMode is applied to the entire MPI SSH auth secret volume, which means the same 0640 permission will be used for the private key, public key, and authorized_keys. That may be problematic for the private key specifically. In many SSH/OpenSSH environments, private keys are expected to be more restrictive, and group-readable permissions can trigger “permissions are too open” style errors.
So while the patch correctly updates the generated volume spec and the related tests, it does not yet prove that SSH runtime behavior remains valid with 0640 for the private key. I would recommend setting per-item modes instead of a single volume-wide DefaultMode.
|
@jiayuzhao05: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thanks for catching this, that makes sense. I’ll switch from using a single defaultMode to setting specific modes per key. The private key will use a stricter permission (0600), while the public key and authorized_keys will use 0644. I’ll update the PR accordingly 👍 |
Signed-off-by: Harsh <harxhist@gmail.com>
jiayuzhao05
left a comment
There was a problem hiding this comment.
This looks good to me now. This revision fixes previous concern.
|
@jiayuzhao05: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@jinchihe @kuizhiqing |
|
looks good to me too |
andreyvelich
left a comment
There was a problem hiding this comment.
Thanks for the contribution, @harxhist! This review was generated using AI tooling. Inline comments below — one is a likely runtime regression that I'd like to confirm before merge.
One PR-level note: the title and the "What this PR does" section still describe DefaultMode: 0640, but the implementation now uses per-key modes (0600 private, 0644 shared). Please update so the eventual commit message matches the code.
| corev1ac.KeyToPath(). | ||
| WithKey(corev1.SSHAuthPrivateKey). | ||
| WithPath(constants.MPISSHPrivateKeyFile). | ||
| WithMode(constants.MPISSHSecretPrivateKeyFileMode), |
There was a problem hiding this comment.
Likely runtime regression — please verify before merge.
Mode 0600 makes the private key readable only by its owner. Secret-volume files are mounted root:root by default, but the bundled deepspeed-distributed and mlx-distributed runtimes set runAsUser: 1000 (mpiuser) — see manifests/base/runtimes/deepspeed_distributed.yaml:36,46 and the matching mlx_distributed.yaml. Neither sets fsGroup, so mpiuser cannot read /home/mpiuser/.ssh/id_rsa and mpirun→SSH will fail at the kernel DAC layer before OpenSSH's sshkey_perm_ok check is reached.
Issue #3262 explicitly recommended 0640 for this exact reason. Options:
- (a) revert this constant to
0640(matches the title + issue analysis), or - (b) keep
0600and add a pod-levelfsGroupto the bundled runtimes so the mpiuser group can read the key.
Current unit tests only assert the rendered spec — please confirm via an E2E run against deepspeed-distributed before merge.
| MPISSHSecretPrivateKeyFileMode int32 = 0600 | ||
|
|
||
| // MPISSHSecretSharedSSHFileMode is the mode for the mounted MPI SSH public key and authorized_keys files (0644). | ||
| MPISSHSecretSharedSSHFileMode int32 = 0644 |
There was a problem hiding this comment.
Naming suggestion for consistency with the surrounding MPISSHPublicKey* family:
MPISSHSecretPrivateKeyFileMode→MPISSHPrivateKeyFileMode(theSecretinfix is noise; every MPI SSH file already lives in the Secret)MPISSHSecretSharedSSHFileMode→MPISSHPublicKeyFileMode(SharedSSHis opaque; bothid_rsa.pubandauthorized_keysare public-key material)
Minor: consider 0o600 / 0o644 notation (Go 1.13+) or include the decimal in the doc comment — makes the octal intent explicit.
| WithPath(constants.MPISSHAuthorizedKeys), | ||
| ), | ||
| WithSecret( | ||
| corev1ac.SecretVolumeSource(). |
There was a problem hiding this comment.
Defense-in-depth suggestion: now that the test-side IgnoreFields(SecretVolumeSource{}, "DefaultMode") was removed, consider also setting WithDefaultMode(0o400) on the SecretVolumeSource. Per-item modes override everything today, but a future contributor adding a 4th item without WithMode(...) would silently inherit Kubernetes' 0644 default.
|
We are looking for someone to pick up this PR to finalize it. |
|
Fixed by: #3649 |
|
@andreyvelich: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does?:
Why we need it?
Explicit default mode improves security for MPI SSH keys (owner read/write, group read, others none) and avoids relying on cluster default behavior.
Which issue this PR fixes
Fixes 3262
Checklist: