-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
NVIDIA Open GPU Kernel Modules Version
590.48.01
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Fedora 42 (Adams)
Kernel Release
6.18.9-100.fc42.x86_64
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
NVIDIA RTX A1000 (GA107GL) [10de:25b0] (rev a1)
Describe the bug
System hangs on reboot — NVIDIA open kernel module PCI shutdown handler hangs after PCIe bus error (RTX A1000)
System Information
| Component | Details |
|---|---|
| OS | Fedora 42 (Adams) |
| Kernel | 6.18.9-100.fc42.x86_64 |
| Hardware | Lenovo ThinkStation P350 |
| GPU | NVIDIA RTX A1000 (GA107GL) [10de:25b0] (rev a1) |
| Driver | NVIDIA 590.48.01, open kernel modules (Dual MIT/GPL) |
| Driver source | negativo17 repo, akmod-nvidia-590.48.01-3.fc42.x86_64 |
| Display server | Xorg with GDM (graphical.target, autologin enabled) |
Problem Description
When issuing /sbin/reboot, the system hangs indefinitely during the shutdown sequence. The machine remains pingable (kernel and network stack are alive) but SSH is refused (sshd has already been stopped) and the reboot never completes. The only recovery is a physical power cycle.
Initially this appeared to occur only after kernel updates, but it has since become reproducible on every reboot.
Journal Evidence
The following was captured from journalctl -b -1 after a power-cycle recovery. During shutdown, after X/GDM stops, the NVIDIA GPU throws a PCIe bus error:
Feb 15 12:03:39 hidal /usr/libexec/gdm-x-session[2319]: (II) NVIDIA(GPU-0): Deleting GPU-0
Feb 15 12:03:39 hidal /usr/libexec/gdm-x-session[2319]: (WW) xf86CloseConsole: KDSETMODE failed: Input/output error
Feb 15 12:03:39 hidal /usr/libexec/gdm-x-session[2319]: (WW) xf86CloseConsole: VT_GETMODE failed: Input/output error
Feb 15 12:03:39 hidal kernel: nvidia 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
Feb 15 12:03:39 hidal kernel: nvidia 0000:01:00.0: device [10de:25b0] error status/mask=00000001/0000a000
Feb 15 12:03:39 hidal kernel: nvidia 0000:01:00.0: [ 0] RxErr (First)
The system then proceeds through the shutdown sequence until systemd-shutdown takes over, where it hangs permanently:
Feb 15 12:03:40 hidal systemd[1]: Reached target shutdown.target - System Shutdown.
Feb 15 12:03:40 hidal systemd[1]: Reached target final.target - Late Shutdown Services.
Feb 15 12:03:41 hidal systemd-shutdown[1]: Syncing filesystems and block devices.
Feb 15 12:03:41 hidal systemd-shutdown[1]: Sending SIGTERM to remaining processes...
The journal ends here. The system never completes the reboot.
Diagnosis
Through testing, the following was confirmed:
reboot -fworks — skipping systemd's shutdown sequence and callingreboot(2)directly always succeeds.- Normal
rebootwith NVIDIA modules unloaded works — after runningsystemctl isolate multi-user.targetfollowed byrmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia, a normal/sbin/rebootcompletes cleanly. - Normal
rebootwith NVIDIA modules loaded hangs — consistently, every time.
Conclusion: The NVIDIA open kernel module's PCI
.shutdowncallback hangs when called during the kernel's device shutdown path, likely because the GPU is in a bad state following the PCIeRxErrphysical layer error.
Workaround
A systemd service that unloads all NVIDIA modules after services have stopped but before the final reboot resolves the issue:
# /etc/systemd/system/nvidia-unload.service
[Unit]
Description=Unload NVIDIA modules during shutdown
DefaultDependencies=no
After=shutdown.target
Before=systemd-reboot.service systemd-poweroff.service systemd-halt.service
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia 2>/dev/null; exit 0'
TimeoutStartSec=30
[Install]
WantedBy=reboot.target poweroff.target halt.target
To Reproduce
Install Fedora42 with an A1000 RTX GPU, use the open-gpu-kernel modules, try to reboot.
(system hangs)
Bug Incidence
Once
nvidia-bug-report.log.gz
More Info
No response