-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
NVIDIA Open GPU Kernel Modules Version
590.48.01 and others
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Gentoo Linux
Kernel Release
6.18.12-gentoo, custom build
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
GeForce RTX 5070 and others
Describe the bug
In the downstream Gentoo Linux bug https://bugs.gentoo.org/969413 multiple users (incl. myself) have reported freezes / kernel panics during system shutdown with various versions of the GPU driver incl. 590.48.01 and various kernel versions incl. 6.18.12.
We've discovered that when the kernel is built with RANDSTRUCT_FULL or RANDSTRUCT_PERFORMANCE, the GPU driver causes the system to do one of the following during shutdown/reboot, just before the physical power-down:
- nothing unusual
- freeze
- kernel panic
Which one of the three behaviors occurs is tied to the kernel build. Removing RANDSTRUCT from the kernel fixes the issue, poweroff is always clean then. The message on the attached screenshot has led me to identify struct NvKmsKapiCallbacks declared in nvkms-kapi.h as the source of the bug: a function contained in it is called unsafely somewhere and when the contents of the struct become shuffled by RANDSTRUCT, a wrong function is called instead. In the screenshot, it is the suspend/resume function, which is out of place on my system - I don't use suspend. When a freeze happens instead of a panic, it is the probe function, presumably.
The following patch disables the randomization of the said struct and ensures the freeze/panic doesn't happen with RANDSTRUCT enabled, confirming the cause of the bug. The effect of the patch has been verified by me (590.48.01 open driver, 6.18.12, RTX 5070) and one other Gentoo user so far (https://bugs.gentoo.org/969413#c41, 580.126.09 proprietary driver, unknown GPU, unknown kernel). It cannot be considered a fix, of course, just a workaround.
diff -Naur work/kernel/common/inc/nvkms-kapi.h work-new/kernel/common/inc/nvkms-kapi.h
--- work/kernel/common/inc/nvkms-kapi.h 2025-12-08 13:50:32.000000000 +0100
+++ work-new/kernel/common/inc/nvkms-kapi.h 2026-02-17 17:13:43.980685382 +0100
@@ -603,7 +603,7 @@
void (*suspendResume)(NvBool suspend);
void (*remove)(NvU32 gpuId);
void (*probe)(const struct NvKmsKapiGpuInfo *gpu_info);
-};
+} __attribute__((no_randomize_layout));
struct NvKmsKapiFunctionsTable {
diff -Naur work/kernel-module-source/kernel-open/common/inc/nvkms-kapi.h work-new/kernel-module-source/kernel-open/common/inc/nvkms-kapi.h
--- work/kernel-module-source/kernel-open/common/inc/nvkms-kapi.h 2025-12-08 13:51:05.000000000 +0100
+++ work-new/kernel-module-source/kernel-open/common/inc/nvkms-kapi.h 2026-02-17 17:13:08.037511054 +0100
@@ -603,7 +603,7 @@
void (*suspendResume)(NvBool suspend);
void (*remove)(NvU32 gpuId);
void (*probe)(const struct NvKmsKapiGpuInfo *gpu_info);
-};
+} __attribute__((no_randomize_layout));
struct NvKmsKapiFunctionsTable {
diff -Naur work/kernel-module-source/src/nvidia-modeset/kapi/interface/nvkms-kapi.h work-new/kernel-module-source/src/nvidia-modeset/kapi/interface/nvkms-kapi.h
--- work/kernel-module-source/src/nvidia-modeset/kapi/interface/nvkms-kapi.h 2025-12-08 13:49:16.000000000 +0100
+++ work-new/kernel-module-source/src/nvidia-modeset/kapi/interface/nvkms-kapi.h 2026-02-17 17:13:23.621208451 +0100
@@ -603,7 +603,7 @@
void (*suspendResume)(NvBool suspend);
void (*remove)(NvU32 gpuId);
void (*probe)(const struct NvKmsKapiGpuInfo *gpu_info);
-};
+} __attribute__((no_randomize_layout));
struct NvKmsKapiFunctionsTable {
diff -Naur work/kernel-open/common/inc/nvkms-kapi.h work-new/kernel-open/common/inc/nvkms-kapi.h
--- work/kernel-open/common/inc/nvkms-kapi.h 2025-12-08 13:51:05.000000000 +0100
+++ work-new/kernel-open/common/inc/nvkms-kapi.h 2026-02-17 17:13:33.981717353 +0100
@@ -603,7 +603,7 @@
void (*suspendResume)(NvBool suspend);
void (*remove)(NvU32 gpuId);
void (*probe)(const struct NvKmsKapiGpuInfo *gpu_info);
-};
+} __attribute__((no_randomize_layout));
struct NvKmsKapiFunctionsTable {
To Reproduce
- Compile kernel 6.18.12 with CONFIG_RANDSTRUCT_FULL=y.
- Compile the open GPU driver 590.48.01 for that kernel.
- Boot it.
- Shut the system down. Observe one of: normal shutdown, freeze, kernel panic. Observe that that particular behavior occurs on each shutdown. There is a probability of each of the three options, determined at kernel build time. Repeat the whole process until a freeze / panic happens.
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
No response
