chore(slinky-engine): Use PodSelector to replace scontrol show partition#294
chore(slinky-engine): Use PodSelector to replace scontrol show partition#294ravisoundar wants to merge 1 commit intomainfrom
Conversation
Greptile SummaryThis PR replaces the Confidence Score: 5/5Safe to merge; all remaining findings are P2 style/quality suggestions that do not affect correctness. All previously flagged P1 issues are addressed: the MatchExpressions-only guard, topology-key vs partition-name lookup (fixed with the new for-loop), and the test coverage gap (tests now call GenerateOutput). Remaining comments are non-blocking. pkg/translate/yaml.go — the blockSizes → block_sizes rename is a breaking schema change for existing ConfigMap consumers; coordinate or document the migration before deploying to existing clusters. Important Files Changed
Sequence DiagramsequenceDiagram
participant GO as GenerateOutput
participant GTC as GetTranslateConfig
participant GPN as getPartitionNodes (slinky)
participant GMN as getMatchingNodes
participant K8S as Kubernetes API
participant SC as scontrol (fallback)
GO->>GTC: GetTranslateConfig(ctx, params, topologyNodeFinder)
loop for each topology/partition
GTC->>GPN: getPartitionNodes(ctx, sect.Partition, params)
GPN->>GPN: search Topologies for matching Partition field → topoName
alt podSelector found in topo.Other
GPN->>GMN: getMatchingNodes(ctx, client, ns, nodeListOpt, podListOpt)
GMN->>K8S: GetNodes (nodeListOpt)
GMN->>K8S: List Pods (podListOpt / podSelector)
K8S-->>GMN: pods with slurm.node.name labels
GMN-->>GPN: nodeMap (k8s→SLURM names)
GPN-->>GTC: ' Nodes=slurm1,slurm2,...'
else no podSelector
GPN->>SC: scontrol show partition name
SC-->>GPN: raw partition info
GPN-->>GTC: raw scontrol output
end
GTC->>GTC: parsePartitionNodes → cluset.ExpandList → cluset.Compact
end
GTC-->>GO: translate.Config
GO->>GO: generateDynamicNodesOutput / reconciliation
Reviews (5): Last reviewed commit: "chore(slinky-engine): Use PodSelecotr to..." | Re-trigger Greptile |
ad4de00 to
4705b28
Compare
4705b28 to
3e0d22f
Compare
dc437a5 to
0766007
Compare
Signed-off-by: Ravi Shankar <ravish@nvidia.com>
0766007 to
ea80dda
Compare
Description
Currently,
scontrol show partitionis used to get the nodes from slurm to generate the topology.This PR, instead, uses the PodSelector configured per partition to identify the pods and their corresponding slurm nodes.
When the podSelector is not specified, the implementation falls back to the
scontrol show partitionapproach.Checklist
git commit -s).