You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An operator can already point a host at any boot interface after the fact with set-primary-interface (#2314). What's missing is the declared, up-front intent: ExpectedHostNic.primary is meant to say "this NIC is the boot interface," but our ingestion automation ignores it and picks by DPU discovery order instead. This is the piece that lets a host declare "neither DPU is north/south primary -- this integrated NIC is" and have it stick -- exactly what a host with a DPU in NIC mode, or a DPU present-but-unused, needs (see #870).
We can't require primary to be set -- tens of thousands of existing machines have never needed it -- so the rule is: when it's declared, it wins; when it isn't, today's automation stands.
Current behavior
pick_boot_interface (crates/api-model/src/machine/mod.rs:280) already reads the primary_interface flag first, then falls back to the lowest-MAC non-underlay interface. The selection (read) side is fine.
The flag is written by three independent paths, and only one honors the declared intent -- and only as a demotion:
Ingestion: configure_host_machine (crates/site-explorer/src/machine_creator.rs:751) marks the first DPU's host interface primary and demotes the rest, purely by discovery order. It already receives machine_data -- the ExpectedMachineData with host_nics[].primary -- but never reads it for this decision.
DHCP: discover.rs:255 reads the declared primary into is_primary_nic and passes it to find_or_create_machine_interface. But machine_interface.rs:464 only ever sets the flag to false (demoting a non-declared NIC) -- there is no promote-to-true branch. A declared NIC stays primary only by inheriting the creation default of true.
Make the declared ExpectedHostNic.primary authoritative across the writers, ideally behind one reconcile decider -- "given this host's interfaces and its declared primary, exactly one is primary" -- that both ingestion and DHCP route through. Precedence: declared primary > DPU takeover > lowest-MAC non-underlay.
configure_host_machine must not promote a DPU interface when a different NIC is declared primary.
Open questions to settle while implementing
How (and whether) configure_host_machine runs for a DPU in NIC mode. In NIC mode no DPU snapshot is attached, so this path may not run for it at all -- which would mean NIC-mode hosts already fall through to the DHCP/declared/lowest-MAC logic. Confirm before assuming the takeover overrides the declared NIC in NIC mode.
The pre-ownership window: DHCP creates these rows with machine_id = NULL (the None branch of find_or_create_machine_interface), and the one_primary_per_machine partial index does not constrain NULL machine_ids. So with no declared primary, multiple newly-leased NICs can each default to primary_interface = true, and pick_boot_interface returns whichever it finds first. Decide whether the reconcile decider should also settle this.
Done when
A host that declares an integrated NIC as primary boots from it even when a DPU is ingested in DPU mode.
No declared primary -> behavior is unchanged from today.
Tests cover: declared primary beats DPU takeover; declared primary survives regardless of DHCP arrival order; absent declared primary keeps today's automation. test_dhcp_marks_non_primary_mac_as_non_primary is the seed.
An operator can already point a host at any boot interface after the fact with
set-primary-interface(#2314). What's missing is the declared, up-front intent:ExpectedHostNic.primaryis meant to say "this NIC is the boot interface," but our ingestion automation ignores it and picks by DPU discovery order instead. This is the piece that lets a host declare "neither DPU is north/south primary -- this integrated NIC is" and have it stick -- exactly what a host with a DPU in NIC mode, or a DPU present-but-unused, needs (see #870).We can't require
primaryto be set -- tens of thousands of existing machines have never needed it -- so the rule is: when it's declared, it wins; when it isn't, today's automation stands.Current behavior
pick_boot_interface(crates/api-model/src/machine/mod.rs:280) already reads theprimary_interfaceflag first, then falls back to the lowest-MAC non-underlay interface. The selection (read) side is fine.configure_host_machine(crates/site-explorer/src/machine_creator.rs:751) marks the first DPU's host interface primary and demotes the rest, purely by discovery order. It already receivesmachine_data-- theExpectedMachineDatawithhost_nics[].primary-- but never reads it for this decision.discover.rs:255reads the declared primary intois_primary_nicand passes it tofind_or_create_machine_interface. Butmachine_interface.rs:464only ever sets the flag tofalse(demoting a non-declared NIC) -- there is no promote-to-true branch. A declared NIC stays primary only by inheriting the creation default oftrue.set_primary_interface(crates/api-db/src/machine_interface.rs:171) -- the post-hoc override from feat: make any host interface the primary, not just a DPU #2314.The change
ExpectedHostNic.primaryauthoritative across the writers, ideally behind one reconcile decider -- "given this host's interfaces and its declared primary, exactly one is primary" -- that both ingestion and DHCP route through. Precedence: declared primary > DPU takeover > lowest-MAC non-underlay.configure_host_machinemust not promote a DPU interface when a different NIC is declared primary.Open questions to settle while implementing
configure_host_machineruns for a DPU in NIC mode. In NIC mode no DPU snapshot is attached, so this path may not run for it at all -- which would mean NIC-mode hosts already fall through to the DHCP/declared/lowest-MAC logic. Confirm before assuming the takeover overrides the declared NIC in NIC mode.machine_id = NULL(theNonebranch offind_or_create_machine_interface), and theone_primary_per_machinepartial index does not constrain NULLmachine_ids. So with no declared primary, multiple newly-leased NICs can each default toprimary_interface = true, andpick_boot_interfacereturns whichever it finds first. Decide whether the reconcile decider should also settle this.Done when
primaryboots from it even when a DPU is ingested in DPU mode.test_dhcp_marks_non_primary_mac_as_non_primaryis the seed.Part of #870.