chore(recipe): bump dynamo-platform from 0.9.x to 1.0.2 and add Grove chart#459
Conversation
|
/ok to test 852739e |
|
@dims Merged changes from main, can we get another CI run? |
Open Questions
|
mchmarny
left a comment
There was a problem hiding this comment.
Blocked on the 2 issues @yuanchen8911 posted
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces the standalone Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/components/dynamo-platform/values.yaml`:
- Around line 31-35: The values set grove.install: false and grove.enabled: true
configure Dynamo to expect an external Grove but you haven't registered Grove as
an AICR component or referenced it from Dynamo overlays; fix by either (A)
adding a Grove component entry in recipes/registry.yaml and creating a
corresponding Grove values file under recipes/overlays/<environment>/ that the
Dynamo overlay references (so the external Grove is provisioned as a component),
or (B) revert to using the bundled subchart by setting grove.install: true (and
keeping grove.enabled: true) so the bundled Grove is installed with Dynamo;
update whichever file you change to match the chosen approach (look for symbols
"grove.install", "grove.enabled", and ensure recipes/registry.yaml and
recipes/overlays/* reference the Grove component if you choose option A).
- Around line 38-39: The values.yaml sets upgradeCRD: true but six overlays
still depend on dynamo-crds 0.9.0 via dependencyRefs, causing dual CRD
management; remove the dynamo-crds:0.9.0 dependencyRefs from the listed overlays
(h100-kind-inference-dynamo.yaml, h100-gke-cos-inference-dynamo.yaml,
h100-eks-ubuntu-inference-dynamo.yaml, h100-aks-ubuntu-inference-dynamo.yaml,
gb200-eks-ubuntu-inference-dynamo.yaml, gb200-oke-ubuntu-inference-dynamo.yaml)
so only dynamo-platform manages CRDs, or alternatively add documentation (e.g.,
README or overlay-level notes) explaining the concurrent-CRD ownership
prevention strategy and the tests that verify CRD ownership; search for any
dependencyRefs entries named "dynamo-crds" and reconcile them with upgradeCRD in
recipes/components/dynamo-platform/values.yaml to ensure a single source of CRD
installation.
In `@recipes/registry.yaml`:
- Line 389: The overlay contains an explicit chart pin "version: \"0.9.0\""
which conflicts with the registry defaultVersion "1.0.1"; update the overlay's
version key from "0.9.0" to "1.0.1" (or remove the version key entirely so the
registry defaultVersion "1.0.1" is used) to ensure the chart renders with
compatible 1.0.x values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 3dac1249-65b6-4622-b7bf-3ce6731ac16c
📒 Files selected for processing (7)
recipes/components/dynamo-platform/values.yamlrecipes/overlays/gb200-eks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-aks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-eks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-gke-cos-inference-dynamo.yamlrecipes/overlays/h100-kind-inference-dynamo.yamlrecipes/registry.yaml
|
@Jont828 can you take a look at the reviews and address as needed? Thanks. |
9c9991a to
3f5d972
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
recipes/components/dynamo-platform/values.yaml (1)
31-35:⚠️ Potential issue | 🟠 MajorMissing declarative Grove provisioning while bundled install is disabled.
Line 31–35 sets
global.grove.install: falseand expects an external Grove (enabled: true), but no Grove component wiring is present in the provided registry/overlays. This can leave Dynamo without Grove at deploy time.Suggested fix (if Grove is not actually externalized elsewhere)
grove: - install: false + install: true enabled: trueAs per coding guidelines,
recipes/**/*.yaml: “For Helm components, add configuration to component registry (recipes/registry.yaml), create values file (recipes/components/*/values.yaml), and reference in recipe overlays (recipes/overlays/*.yaml).”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/components/dynamo-platform/values.yaml` around lines 31 - 35, The recipe disables bundled Grove install (global.grove.install: false) while leaving grove.enabled: true but does not wire a declarative external Grove into the component registry/overlays; add a Grove entry to the component registry (recipes/registry.yaml) with appropriate component id and chart reference, add a Grove values file under recipes/components/dynamo-platform/ (or appropriate component folder) that declares the external endpoint/credentials, and update the recipe overlay(s) in recipes/overlays/*.yaml to reference the Grove component (so that when bundled install is disabled the external Grove is resolved via registry/overlays); ensure you modify the symbols global.grove.install, grove.enabled, and the registry/component names to match existing naming conventions.recipes/registry.yaml (1)
391-391:⚠️ Potential issue | 🟠 MajorExplicit 0.9.0 overlay pin still bypasses this new default.
Line 391 updates the registry default, but
recipes/overlays/gb200-oke-ubuntu-inference-dynamo.yamlis still explicitly pinned toversion: "0.9.0"(per provided PR context). That leaves one Dynamo overlay on 0.9 while this PR’s values/schema is 1.0-oriented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/registry.yaml` at line 391, The overlay recipes/overlays/gb200-oke-ubuntu-inference-dynamo.yaml is still explicitly pinned to version: "0.9.0" which bypasses the new defaultVersion: "1.0.1" in recipes/registry.yaml; update that overlay’s version field to "1.0.1" (or remove the explicit version key so it inherits the registry default) to bring it in line with the new 1.0 schema and values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@recipes/components/dynamo-platform/values.yaml`:
- Around line 31-35: The recipe disables bundled Grove install
(global.grove.install: false) while leaving grove.enabled: true but does not
wire a declarative external Grove into the component registry/overlays; add a
Grove entry to the component registry (recipes/registry.yaml) with appropriate
component id and chart reference, add a Grove values file under
recipes/components/dynamo-platform/ (or appropriate component folder) that
declares the external endpoint/credentials, and update the recipe overlay(s) in
recipes/overlays/*.yaml to reference the Grove component (so that when bundled
install is disabled the external Grove is resolved via registry/overlays);
ensure you modify the symbols global.grove.install, grove.enabled, and the
registry/component names to match existing naming conventions.
In `@recipes/registry.yaml`:
- Line 391: The overlay recipes/overlays/gb200-oke-ubuntu-inference-dynamo.yaml
is still explicitly pinned to version: "0.9.0" which bypasses the new
defaultVersion: "1.0.1" in recipes/registry.yaml; update that overlay’s version
field to "1.0.1" (or remove the explicit version key so it inherits the registry
default) to bring it in line with the new 1.0 schema and values.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: b395bcd0-9942-4762-bea8-13f299c6d977
📒 Files selected for processing (7)
recipes/components/dynamo-platform/values.yamlrecipes/overlays/gb200-eks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-aks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-eks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-gke-cos-inference-dynamo.yamlrecipes/overlays/h100-kind-inference-dynamo.yamlrecipes/registry.yaml
3f5d972 to
abb2bc6
Compare
abb2bc6 to
5d1ace4
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/recipe/conformance_test.go (1)
33-341: 🧹 Nitpick | 🔵 TrivialAdd one Dynamo OKE conformance case to harden regression coverage.
The table covers kind/EKS/GKE Dynamo but not OKE. Adding an OKE Dynamo case would catch overlay-version drift earlier in CI.
Suggested table entry skeleton
+ { + name: "gb200-oke-ubuntu-inference-dynamo", + criteria: func() *Criteria { + c := NewCriteria() + c.Service = CriteriaServiceOKE + c.Accelerator = CriteriaAcceleratorGB200 + c.OS = CriteriaOSUbuntu + c.Intent = CriteriaIntentInference + c.Platform = CriteriaPlatformDynamo + return c + }, + requiredComponents: []string{ + "cert-manager", "gpu-operator", "kube-prometheus-stack", + "prometheus-adapter", "nvidia-dra-driver-gpu", "kai-scheduler", + "kgateway-crds", "kgateway", "grove", "dynamo-platform", + }, + wantDRAConstraint: true, + },As per coding guidelines:
**/*_test.go: “Write unit tests alongside code changes; all code must have test coverage with race detector enabled.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/recipe/conformance_test.go` around lines 33 - 341, Add a new Dynamo/OKE test case to the tests slice in TestConformanceRecipeInvariants: create an entry (e.g. name "h100-oke-ubuntu-inference-dynamo") where criteria() sets Service = CriteriaServiceOKE, Accelerator = CriteriaAcceleratorH100, OS = CriteriaOSUbuntu, Intent = CriteriaIntentInference, Platform = CriteriaPlatformDynamo; mirror requiredComponents and requiredChecks from the existing "h100-eks-ubuntu-inference-dynamo" case (including "grove" and "dynamo-platform" in components and checks like "robust-controller" and "secure-accelerator-access"), and set wantDRAConstraint to true so the OKE+Dynamo path is covered by the conformance invariants.pkg/recipe/deployment_order_guard_test.go (1)
65-159:⚠️ Potential issue | 🟠 MajorAdd an OKE Dynamo guard case to close a regression gap.
This table now validates EKS/Kind Dynamo dependency guards withgrove, but it still does not cover the OKE Dynamo overlay path. A stale OKE overlay dependency/version can bypass this guard set.Please add an OKE Dynamo scenario (same
dynamo-platformdependency expectations and ordering checks) to this matrix.As per coding guidelines, "Write unit tests alongside code changes; all code must have test coverage with race detector enabled" and "Use table-driven tests for multiple test cases."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/recipe/deployment_order_guard_test.go` around lines 65 - 159, Add a table-driven test case mirroring the EKS/Ubuntu Dynamo entries but for OKE: create a new test case named e.g. "h100-oke-ubuntu-inference-dynamo" that calls NewCriteria() and sets c.Service = CriteriaServiceOKE, c.Accelerator = CriteriaAcceleratorH100, c.Intent = CriteriaIntentInference, c.OS = CriteriaOSUbuntu, c.Platform = CriteriaPlatformDynamo; set requiredDeps for "dynamo-platform" to {"grove","cert-manager","kube-prometheus-stack","gpu-operator","kai-scheduler"} and requiredOrdering to the same pairs as the EKS Dynamo case ({"gpu-operator","dynamo-platform"}, {"kai-scheduler","dynamo-platform"}, {"gpu-operator","nvsentinel"}) so the deployment_order_guard_test.go matrix covers the OKE overlay path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/overlays/gb200-eks-ubuntu-inference-dynamo.yaml`:
- Around line 43-46: The overlay adds a Helm component named "grove" but never
references its component-level values file, so defaults in the grove component's
values.yaml are ignored; update the component registry (registry.yaml) to
include the grove component entry, create or ensure a values.yaml under the
grove component directory with the needed defaults, and then modify the
gb200-eks-ubuntu-inference-dynamo overlay to reference the grove component and
point to its values file so the component-level configuration is applied.
In `@recipes/overlays/gb200-oke-ubuntu-inference-dynamo.yaml`:
- Around line 43-46: The overlay's Helm component entry for "grove" is missing
the valuesFile reference; update the component block for name: grove (the Helm
component entry) to include valuesFile: components/grove/values.yaml so the
overlay consumes the standalone grove values file introduced in the component
registry and recipes/components/grove/values.yaml.
In `@recipes/overlays/h100-aks-ubuntu-inference-dynamo.yaml`:
- Around line 43-46: The overlay currently declares the Helm component "grove"
directly without referencing its component values file; update the overlay's
component entry for grove to attach the component's values.yaml (the grove
component's values file) so the Helm component wiring uses the standard
component registry values; specifically, modify the "grove" component block in
the overlay to include the values reference (pointing to the grove component's
values.yaml) rather than only specifying source/version, ensuring the overlay
uses the component's configured values.
In `@recipes/overlays/h100-eks-ubuntu-inference-dynamo.yaml`:
- Around line 43-46: The overlay currently installs the grove Helm chart using
the chart entry (name: grove, source: oci://ghcr.io/ai-dynamo/grove, version:
"v0.1.0-alpha.6") but omits the component valuesFile; update the overlay to add
valuesFile: components/grove/values.yaml so the chart picks up the
component-level values file defined in the same PR; ensure the valuesFile key is
added alongside the existing Helm chart fields so the overlay references the
component values rather than using raw chart defaults.
In `@recipes/overlays/h100-gke-cos-inference-dynamo.yaml`:
- Around line 43-46: The overlay's Helm component entry for name: grove is
missing a reference to the component values file you added; update the grove
Helm block in the overlay (the entry with name: grove and source:
oci://ghcr.io/ai-dynamo/grove) to include the component's values reference
(pointing to the values file added for the grove component) so the overlay
actually uses components/grove/values.yaml, and confirm the grove component is
registered in the recipes/registry.yaml component registry.
In `@recipes/overlays/h100-kind-inference-dynamo.yaml`:
- Around line 37-40: The overlay adds a Helm component named "grove" but doesn't
wire the new component-level values.yaml into the Helm component reference,
leaving recipes/components/grove/values.yaml unused; update the overlay's
"grove" Helm component entry so it points to the component values (add the
values reference under the grove component in the overlay), and ensure the
component is registered in registry.yaml with the same component name "grove" so
the overlay can resolve and inject recipes/components/grove/values.yaml into the
Helm deploy.
---
Outside diff comments:
In `@pkg/recipe/conformance_test.go`:
- Around line 33-341: Add a new Dynamo/OKE test case to the tests slice in
TestConformanceRecipeInvariants: create an entry (e.g. name
"h100-oke-ubuntu-inference-dynamo") where criteria() sets Service =
CriteriaServiceOKE, Accelerator = CriteriaAcceleratorH100, OS =
CriteriaOSUbuntu, Intent = CriteriaIntentInference, Platform =
CriteriaPlatformDynamo; mirror requiredComponents and requiredChecks from the
existing "h100-eks-ubuntu-inference-dynamo" case (including "grove" and
"dynamo-platform" in components and checks like "robust-controller" and
"secure-accelerator-access"), and set wantDRAConstraint to true so the
OKE+Dynamo path is covered by the conformance invariants.
In `@pkg/recipe/deployment_order_guard_test.go`:
- Around line 65-159: Add a table-driven test case mirroring the EKS/Ubuntu
Dynamo entries but for OKE: create a new test case named e.g.
"h100-oke-ubuntu-inference-dynamo" that calls NewCriteria() and sets c.Service =
CriteriaServiceOKE, c.Accelerator = CriteriaAcceleratorH100, c.Intent =
CriteriaIntentInference, c.OS = CriteriaOSUbuntu, c.Platform =
CriteriaPlatformDynamo; set requiredDeps for "dynamo-platform" to
{"grove","cert-manager","kube-prometheus-stack","gpu-operator","kai-scheduler"}
and requiredOrdering to the same pairs as the EKS Dynamo case
({"gpu-operator","dynamo-platform"}, {"kai-scheduler","dynamo-platform"},
{"gpu-operator","nvsentinel"}) so the deployment_order_guard_test.go matrix
covers the OKE overlay path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 6dcba3e7-72b0-4348-8fc9-0eaa571ccc99
📒 Files selected for processing (25)
.github/workflows/gpu-h100-inference-test.yamldocs/user/api-reference.mddocs/user/component-catalog.mdpkg/bundler/deployer/helm/helm_test.gopkg/evidence/scripts/manifests/dynamo-vllm-agg.yamlpkg/recipe/conformance_test.gopkg/recipe/deployment_order_guard_test.gopkg/recipe/metadata_test.gorecipes/components/dynamo-platform/values.yamlrecipes/components/grove/values.yamlrecipes/overlays/gb200-eks-ubuntu-inference-dynamo.yamlrecipes/overlays/gb200-oke-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-aks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-eks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-gke-cos-inference-dynamo.yamlrecipes/overlays/h100-kind-inference-dynamo.yamlrecipes/registry.yamltests/chainsaw/ai-conformance/README.mdtests/chainsaw/ai-conformance/cluster/assert-crds.yamltests/chainsaw/ai-conformance/kind-inference-dynamo/assert-crds.yamltests/chainsaw/ai-conformance/offline/assert-recipe.yamltests/chainsaw/ai-conformance/offline/chainsaw-test.yamltests/uat/aws/tests/cuj2-inference/assert-recipe.yamltests/uat/azure/tests/cuj2-inference/assert-recipe.yamltests/uat/gcp/tests/cuj2-inference/assert-recipe.yaml
|
@mchmarny @yuanchen8911 I fixed the changes and updated to 1.0.2 for dynamo instead! Also added/fixed the Grove install too. |
8503a09 to
88fd80d
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai resolve |
✅ Actions performedComments resolved. Approval is disabled; enable |
yuanchen8911
left a comment
There was a problem hiding this comment.
Confirmed — CodeRabbit threads and my review comments are all resolved.
LGTM, approved. Needs rebase onto latest main and a re-review from @mchmarny to clear the stale CHANGES_REQUESTED.
/lgtm
Signed-off-by: Jont828 <jt572@cornell.edu>
Add valuesFile reference to the grove component block in all 6 dynamo overlays so the component-level values.yaml is consumed consistently with other components like dynamo-platform. Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Jont828 <jt572@cornell.edu>
Grove is now a standalone AICR component, not a subchart of dynamo-platform. Update the header comment to reflect this. Signed-off-by: Jont828 <jt572@cornell.edu>
88fd80d to
63b11eb
Compare
mchmarny
left a comment
There was a problem hiding this comment.
Both blockers from my prior review are resolved:
gb200-oke-ubuntu-inference-dynamo.yamlis ondynamo-platform 1.0.2and referencescomponents/grove/values.yamllike the other 5 overlays.groveis now a first-class component inrecipes/registry.yaml(chartoci://ghcr.io/ai-dynamo/grove, versionv0.1.0-alpha.6), all 6 Dynamo overlays declare it as adependencyReffordynamo-platform, andrecipes/components/grove/values.yamlis wired in.
Clearing the stale CHANGES_REQUESTED. Please rebase onto latest main before merge.
/lgtm
Summary
recipes/components/dynamo-platform/values.yamlfor the 1.0 Helm schema:global.*subchart controls,upgradeCRD: true, removed stale image pin and kube-rbac-proxy workaround (fixed upstream)dynamo-crdsversion intentionally unchanged (no 1.0 CRD chart exists; platform chart now bundles CRDs viaupgradeCRD)Test plan
go test -race ./pkg/recipe/... -count=1— passesgo test -race ./pkg/bundler/... -count=1— passesmake test— all tests pass, coverage 72%+make lint(golangci-lint + yamllint on changed files) — cleanmake kwok-e2e RECIPE=h100-eks-ubuntu-inference-dynamo)dynamo-platform1.0.1 chart renders correctly withglobal.*keys🤖 Generated with Claude Code