Skip to content

integration: handle silent MAC address assignment failures on Fedora#5310

Open
lifubang wants to merge 3 commits into
opencontainers:mainfrom
lifubang:fix-net-mac-addr-random
Open

integration: handle silent MAC address assignment failures on Fedora#5310
lifubang wants to merge 3 commits into
opencontainers:mainfrom
lifubang:fix-net-mac-addr-random

Conversation

@lifubang

@lifubang lifubang commented Jun 3, 2026

Copy link
Copy Markdown
Member

Fix #5013
On Fedora, certain conditions can cause MAC address assignment to fail
without clear indication. To ensure reliability across distributions,
the code now explicitly reads back the MAC address after attempting to
set it and verifies the result.

This PR also refactor the tear down script to reduce unnecessary log when testing net dev:
Cannot find device "dummy0".

Comment thread tests/integration/checkpoint.bats Outdated
Comment on lines +31 to +35
ip link del dev dummy0
ip link show dummy0 >/dev/null 2>&1
exists=$?
if [ $exists -eq 0 ]; then
ip link del dev dummy0
fi

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't think I have enough context. Why keeping the mac address and doing this change woudl avoid the # Cannot find device "dummy0" from the issue?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the device has been moved to the container’s network namespace once the container started successfully.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the second commit either. This is teardown, we just need to remove the device. If it does not exist, it's fine -- we can just hide the stderr:

IOW I'd rather see

ip link del dev dummy0 &>/dev/null || true

or something like that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the second commit either.

Actually, it’s not related to the random MAC address issue -- it’s a refactor aimed at reducing unnecessary log output in CI. I'll add a comment in the commit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why the change of mac address is triggering that we don't find the device? I can't make sense of that. What other context am I missing?

@lifubang lifubang Jun 4, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why the change of mac address is triggering that we don't find the device?

In fact, ip link del dev dummy0 always prints Cannot find device "dummy0" when the test passes — because dummy0 has been moved into the container's network namespace. Bats hides this error when the test succeeds, but shows it in the teardown output when the test fails, which is noisy and misleading.

This commit is independent of the MAC address fix. The "Cannot find device" noise is caused by the container starting successfully (dummy0 gets moved into the container netns), not by anything related to MAC address.

This commit isn't required to fix the Fedora flake -- it's a separate cleanup to reduce noise in CI logs. I kept it in this PR because both issues were found while debugging #5013. But I'm happy to split them if you preferred. For context on why suppressing this noise matters:
#5307 (comment)

@lifubang lifubang force-pushed the fix-net-mac-addr-random branch from 92e7353 to 6c29d72 Compare June 3, 2026 14:38
@lifubang lifubang marked this pull request as draft June 3, 2026 14:48
@lifubang lifubang force-pushed the fix-net-mac-addr-random branch 6 times, most recently from 699659b to 368bb86 Compare June 4, 2026 08:59
@lifubang lifubang changed the title libct: reset the original MAC addr in the new ns integration: handle silent MAC address assignment failures on Fedora Jun 4, 2026
@lifubang lifubang added area/ci backport/1.4-todo A PR in main branch which needs to backported to release-1.4 backport/1.5-todo A PR in main branch which needs to be backported to release-1.5 labels Jun 4, 2026
@lifubang lifubang added this to the 1.5.0 milestone Jun 4, 2026
@lifubang lifubang marked this pull request as ready for review June 4, 2026 11:14
@cyphar

cyphar commented Jun 4, 2026

Copy link
Copy Markdown
Member

I'm not quite sure how I feel about this, in particular what the intended semantics of the auto-setting of NET_ADDR_RANDOM are in the kernel -- in userspace, MAC randomisation is a kinda important privacy feature but I guess that has a different threat model?

@lifubang

lifubang commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

I'm not quite sure how I feel about this, in particular what the intended semantics of the auto-setting of NET_ADDR_RANDOM are in the kernel -- in userspace, MAC randomisation is a kinda important privacy feature but I guess that has a different threat model?

This PR only touches test code in tests/integration/ -- there are zero changes to runc's runtime. No impact on container network behavior.

The situation is: on Fedora, the kernel may silently reject explicit MAC assignments on dummy devices (ip link set address returns 0 but the MAC stays random). This is a kernel-level behavior(or a random bug in Fedora), not something runc controls.

The test's purpose is to verify that the container preserves the network device state. The fix reads back whatever MAC the kernel actually assigned and uses that for the assertion -- so we compare against what the device actually has, not what we tried and failed to set. The fix accommodates the kernel's MAC policy rather than working against it.

If there's concern about the wording in the comment, I can clarify it to explicitly note this is about kernel behavior, not runc's.

@kolyshkin kolyshkin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of the first commit needs updating.

@kolyshkin kolyshkin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into kernel self-tests (linux/tools/testing/selftests) and they never set mac address on dummy0.

So maybe we should not do it either, and just check the mac is the same after as it was before?

PS I've also noticed they always bring the interface up:

net/cmsg_ip.sh:ip -netns $NS link add type dummy
net/cmsg_ip.sh-ip -netns $NS link set dev dummy0 up
--
net/cmsg_so_mark.sh:ip -netns $NS link add type dummy
net/cmsg_so_mark.sh-ip -netns $NS link set dev dummy0 up
--
net/cmsg_time.sh:ip -netns $NS link add type dummy
net/cmsg_time.sh-ip -netns $NS link set dev dummy0 up
--
net/fq_band_pktlimit.sh:ip link add type dummy
net/fq_band_pktlimit.sh-ip link set dev dummy0 up

@cyphar

cyphar commented Jun 5, 2026

Copy link
Copy Markdown
Member

This PR only touches test code in tests/integration/ -- there are zero changes to runc's runtime. No impact on container network behavior.

In my defense, that wasn't the case when I last reviewed it (it used to have the change in libcontainer/network_linux.go).

ip link set address "$mac_address" dev dummy0
ip address add "$global_ip" dev dummy0
# May fail silently on Fedora, so read back the MAC address to verify.
mac_address=$(ip link show dummy0 | awk '$1=="link/ether"{print $2}')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does bats use pipefail? Or is the idea that this will fail later because this variable is empty in the failure case?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misunderstood what you meant by "fail" -- you mean that the address will be random rather than the one we specified?

If so, the comment could be a little clearer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean that the address will be random rather than the one we specified?

Yes, if there is a ip link list after ip link set address, we will get this:

not ok 34 checkpoint and restore with netdevice
# (in test file tests/integration/checkpoint.bats, line 164)
#   `[[ "$output" == *"ether $mac_address "* ]]' failed
# runc spec (status=0)
#
# 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
#     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
#     link/ether 52:55:55:bd:9b:52 brd ff:ff:ff:ff:ff:ff
#     altname enx525555bd9b52
# 27: dummy0: <BROADCAST,NOARP> mtu 1789 qdisc noop state DOWN mode DEFAULT group default qlen 1000
#     link/ether 2a:2b:cd:25:53:43 brd ff:ff:ff:ff:ff:ff
# runc run -d --console-socket /tmp/bats-run-kRNrN4/runc.GgeEBy/tty/sock test_busybox_netdevice (status=0)

Please see: https://github.com/opencontainers/runc/actions/runs/26936316745/job/79466846626

I'll update the comment.

@lifubang

lifubang commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

PS I've also noticed they always bring the interface up:

I'll add it.

and they never set mac address on dummy0.

I think there is no hurt to set mac address on dummy0, because on most os it works fine.

@lifubang lifubang force-pushed the fix-net-mac-addr-random branch from 368bb86 to 2f5ba21 Compare June 5, 2026 01:51

@rata rata left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang thanks for working on this! Left a few questions :)

ip link set dev dummy0 up
# Even when a specific MAC address is explicitly set, Fedora may still randomize it.
# Read back the actual address to confirm.
mac_address=$(ip link show dummy0 | awk '$1=="link/ether"{print $2}')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just cat /sys/class/net/dummy0/address also works here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both are ok.

Comment thread tests/integration/checkpoint.bats
Comment thread tests/integration/netdev.bats
lifubang and others added 3 commits June 11, 2026 14:09
To reduce unnecessary log output in CI:
Cannot find device "dummy0"

Signed-off-by: lifubang <lifubang@acmcoder.com>
On Fedora, certain conditions can cause MAC address assignment to fail
without clear indication. To ensure reliability across distributions,
the code now explicitly reads back the MAC address after attempting to
set it and verifies the result.

Signed-off-by: lifubang <lifubang@acmcoder.com>
Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
@lifubang lifubang force-pushed the fix-net-mac-addr-random branch from 2f5ba21 to 66cfdb2 Compare June 11, 2026 06:09
@lifubang

Copy link
Copy Markdown
Member Author

Seeing flakes again in #5316. @rata @kolyshkin, could you take a look? Should we go ahead with the merge, or is there a better way forward?

@rata rata left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang thanks, sorry for the slow review, I'm at an event

Comment thread tests/integration/checkpoint.bats
Comment thread tests/integration/netdev.bats
Comment on lines +151 to +153
# Even when a specific MAC address is explicitly set, Fedora may still randomize it.
# Read back the actual address to confirm.
mac_address=$(ip link show dummy0 | awk '$1=="link/ether"{print $2}')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand why fedora is randomizing the mac? Maybe we can create the device differently so it is not randomized? Or, if there is nothing to do, maybe skip only on fedora?

It feels weird to allow to set the mac, but then the tests just don't check it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang This is basically disabling the mac address check, right? IMHO I think it makes sense to understand why this happens, and after that if we want to disable mac tests, only do it in distros that need this.

# set a custom mac address to the interface
ip link set address "$mac_address" dev dummy0
ip link set dev dummy0 up
# Even when a specific MAC address is explicitly set, Fedora may still randomize it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Fedora/<SPECIFIC COMPONENT (systemd-networkd?)> since <SPECIFIC VERSION>/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug or a feature? Where is this documented in?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug or a feature? Where is this documented in?

I believe this is a bug, not a feature.

In a previous test, I added logging to the BATS script, and the output clearly indicated unexpected behavior—consistent with a bug (see: #5310 (comment)).

That said, I haven’t found any official documentation describing this behavior, nor have I seen it reported elsewhere online. If it were intentional, I’d expect it to be documented -- but so far, I’ve found no such reference.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Fedora/<SPECIFIC COMPONENT (systemd-networkd?)> since /

In fact, I’m not certain about the exact component (e.g., systemd-networkd?) or the specific version in which this behavior was introduced -- so I used “Fedora” as a placeholder. If anyone knows the precise subsystem and version where this started, I’d appreciate the clarification!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be systemd-udevd or NetworkManager.

In fact, maybe if we bring the device up, wait for udev to settle and only then assign the mac, this might be all what's needed to fix this flake.

Let me give it a try.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, taking your script from #5310 (comment) and adding this patch:

--- ./testMAC.sh-orig	2026-06-16 17:33:50.309502068 +0000
+++ ./testMAC.sh	2026-06-16 17:34:36.028654316 +0000
@@ -5,6 +5,7 @@
 while true; do
   ((round++))
   ip link add dummy0 type dummy
+  udevadm settle
   # If we bring the dev up, there's still a one-in-a-thousand chance
   # to get a random MAC addr.
   # ip link set dummy0 up

it never fails for me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, something like this may fix it entirely: kolyshkin@63e0cbc

(Frankly I can't decide which way is better, as now we have 3 different ones -- the above patch, this PR, and #5324).

Cc @rata @lifubang

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[kir@lima-fedora-rh ~]$ timeout 3m sudo ./testMAC.sh-orig 
got a random MAC(ee:f7:41:ac:d4:bd) in round(328),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(904),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(1566),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(3746),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(3757),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(4548),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(4919),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(5139),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(5445),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(5478),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(6012),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(6082),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(7657),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(7773),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(7776),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(7785),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(7904),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(7916),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(8116),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(8452),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(8544),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(9901),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(10675),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(10862),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(11112),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(11669),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(11717),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(11983),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(12162),\n
 retring for 1 times......
got a random MAC(ee:f7:41:ac:d4:bd) in round(13018),\n
 retring for 1 times......
[kir@lima-fedora-rh ~]$ timeout 10m sudo ./testMAC.sh
[kir@lima-fedora-rh ~]$ # ^^^ Look ma, no failures!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolyshkin I tried that too in the PR I opened :) Then I changed it to just remove the MAC, as I think it doesn't make much sense to test something we don't change at all.

I think we could do the 3 things:

  • udevadm settle, just after creating the interface
  • Put the data in the same command we create the interface, as then udev usually doesn't act on values that were set
  • Remove the MAC addr changes, as we literally don't do anything about it in runc code

What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the discussion in #5324

@lifubang

Copy link
Copy Markdown
Member Author

In fact, this flaky test is not limited to Fedora -- it also occurs on Ubuntu.
Below is the test script (testMAC.sh):

#!/bin/bash
round=0
mac_address="00:11:22:33:44:55"
new_addr=""
while true; do
  ((round++))
  ip link add dummy0 type dummy
  # If we bring the dev up, there's still a one-in-a-thousand chance
  # to get a random MAC addr.
  # ip link set dummy0 up
  ip link set address "$mac_address" dev dummy0
  st=$?
  if [ $st -ne 0 ]; then
    echo "exit with $st"
  fi
  new_addr=$(cat /sys/class/net/dummy0/address)
  try=0
  while [[ "$mac_address" != "$new_addr" ]]; do
    ((try++))
    echo "got a random MAC($new_addr) in round($round),\n"
    echo " retring for $try times......"
    if [ $try -eq 10 ]; then
      echo "round$round: $new_addr"
      exit
    fi
    ip link set address "$mac_address" dev dummy0
    new_addr=$(cat /sys/class/net/dummy0/address)
  done
  ip link del dev dummy0
done
ip link del dev dummy0

When running this script, we observe output like the following:

lifubang@acmcoder:~$ sudo ./testMAC.sh 
got a random MAC(c6:83:a5:87:46:cd) in round(24),\n
 retring for 1 times......
got a random MAC(c6:83:a5:87:46:cd) in round(104),\n
 retring for 1 times......
got a random MAC(c6:83:a5:87:46:cd) in round(120),\n
 retring for 1 times......
got a random MAC(c6:83:a5:87:46:cd) in round(172),\n
 retring for 1 times......
got a random MAC(c6:83:a5:87:46:cd) in round(250),\n
 retring for 1 times......
got a random MAC(c6:83:a5:87:46:cd) in round(332),\n
 retring for 1 times......
got a random MAC(c6:83:a5:87:46:cd) in round(348),\n
 retring for 1 times......
got a random MAC(c6:83:a5:87:46:cd) in round(354),\n
 retring for 1 times......
got a random MAC(c6:83:a5:87:46:cd) in round(378),\n
 retring for 1 times......
^C

Interestingly, reapplying the same command almost always succeeds on the first retry.

@lifubang

Copy link
Copy Markdown
Member Author

Moreover, if we specify the MAC address directly in the ip link add command, it always succeeds.

ip link add dummy0 address "$mac_address" type dummy

@rata

rata commented Jun 15, 2026

Copy link
Copy Markdown
Member

@lifubang Awesome, thanks! Maybe I'm missing something, but it seems the fail is real. Let's see how to fix the real underlying issue :)

EDIT: Oh, it seems the spec doesn't talk about mac address? https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#network-devices. I wonder if this only happens with fake devices and probably not with real network interfaces?

@rata

rata commented Jun 16, 2026

Copy link
Copy Markdown
Member

Moreover, if we specify the MAC address directly in the ip link add command, it always succeeds.

Oh, I missed this comment! I was also testing that :-D. I've also opened another PR, it also seems to fix it. But now that I look at the code, I think we should just remove the check of the mac address.

Changed my PR to remove the MAC checks. Let me know what you think. I'm also okay with doing it at netdev creation time.

@aojea

aojea commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

catching up

@cyphar

cyphar commented Jun 19, 2026

Copy link
Copy Markdown
Member

I'm going to remove this from the 1.5 milestone so I can release 1.5.0 today -- this is an existing issue that only affects CI and we can always backport a fix later once we figure out the best approach.

@cyphar cyphar removed this from the 1.5.0 milestone Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci backport/1.4-todo A PR in main branch which needs to backported to release-1.4 backport/1.5-todo A PR in main branch which needs to be backported to release-1.5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky test: not ok 40 checkpoint and restore with netdevice (with --debug) on Fedora 43

6 participants