Skip to content

Comments

uvm: Fix build failure for Linux 6.19+ due to HMM and PMM API changes#1015

Open
gg582 wants to merge 5 commits intoNVIDIA:mainfrom
gg582:fix/linux-6.19
Open

uvm: Fix build failure for Linux 6.19+ due to HMM and PMM API changes#1015
gg582 wants to merge 5 commits intoNVIDIA:mainfrom
gg582:fix/linux-6.19

Conversation

@gg582
Copy link

@gg582 gg582 commented Jan 27, 2026

Description

This PR addresses build failures in the NVIDIA UVM kernel module when compiling against Linux kernel version 6.19.0 and later. It handles two major API changes in the upstream kernel:

  1. zone_device_page_init signature change:
  • In Linux 6.19, zone_device_page_init now requires an additional argument.
  • Updated uvm_hmm.c to pass the required arguments based on the LINUX_VERSION_CODE.
  1. struct dev_pagemap_ops modification:
  • The page_free callback has been removed/changed in recent kernel updates.
  • Updated uvm_pmm_gpu.c to wrap the page_free assignment with version checks to prevent compilation errors.

The changes use conditional compilation to maintain backward compatibility with older kernel versions.

Testing

  • Target Kernel: Linux 6.19.0+
  • Result: Verified that the module builds successfully without regressions for older kernel versions.

@jeamieofqidan
Copy link

Testing has shown that open-kernel-modules with this patch applied fail to compile the Nvidia module under Ubuntu.

@ptr1337
Copy link

ptr1337 commented Jan 29, 2026

Testing has shown that open-kernel-modules with this patch applied fail to compile the Nvidia module under Ubuntu.

https://github.com/CachyOS/kernel-patches/blob/master/6.19/misc/nvidia/0003-Fix-compile-for-6.19.patch

Here is a fix, which works across several kernels versions. Even tough, I hope NVIDIA will push a 590 update before 6.19 goes stable.

@Fjodor42
Copy link

Fjodor42 commented Jan 31, 2026

Testing has shown that open-kernel-modules with this patch applied fail to compile the Nvidia module under Ubuntu.

This seems to stem from the fact that Ubuntu has begun setting CONFIG_OBJTOOL_WERROR=y in their kernels.

Hence, we need more steps there (and for other kernels setting that), at least to get the DKMS package to compile:

  1. Cloning this repository and applying the patch from this PR (or the one from CachyOS mentioned in uvm: Fix build failure for Linux 6.19+ due to HMM and PMM API changes #1015 (comment))
  2. Apply patch [1] below
  3. Compile with make modules -j$(nproc)
  4. Copy src/nvidia/_out/Linux_x86_64/nv-kernel.o to /usr/src/nvidia-590.48.01/nvidia/nv-kernel.o_binary
  5. Copy src/nvidia-modeset/_out/Linux_x86_64/nv-modeset-kernel.o to /usr/src/nvidia-590.48.01/nvidia-modeset/nv-modeset-kernel.o_binary
  6. Apply the patch from this PR (or the one from CachyOS mentioned in uvm: Fix build failure for Linux 6.19+ due to HMM and PMM API changes #1015 (comment)) in /usr/src/nvidia-590.48.01
  7. Install Ubuntu Linux 6.19 kernel packages
  8. ?
  9. Profit

[1]

diff --git a/src/nvidia-modeset/Makefile b/src/nvidia-modeset/Makefile
index b54138cc..a38244fd 100644
--- a/src/nvidia-modeset/Makefile
+++ b/src/nvidia-modeset/Makefile
@@ -112,7 +112,6 @@ endif

 CFLAGS += -fno-pic
 CFLAGS += -fno-common
-CFLAGS += -fomit-frame-pointer
 CFLAGS += -fno-strict-aliasing
 CFLAGS += -ffunction-sections
 CFLAGS += -fdata-sections
@@ -153,8 +152,11 @@ ifeq ($(TARGET_ARCH),x86_64)
     CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -fcf-protection=branch)
   endif
   CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -fno-jump-tables)
+  CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -fno-asynchronous-unwind-tables)
   CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mindirect-branch=thunk-extern)
   CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mindirect-branch-register)
+  CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mharden-sls=all)
+  CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mfunction-return=thunk-extern)
 endif

 CFLAGS += $(CONDITIONAL_CFLAGS)
diff --git a/src/nvidia/Makefile b/src/nvidia/Makefile
index d1d6d866..b5a09324 100644
--- a/src/nvidia/Makefile
+++ b/src/nvidia/Makefile
@@ -184,7 +184,9 @@ ifeq ($(TARGET_ARCH),x86_64)
   endif
   CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -fno-jump-tables)
   CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mindirect-branch-register)
-    CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mindirect-branch=thunk-extern)
+  CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mindirect-branch=thunk-extern)
+  CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mharden-sls=all)
+  CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -mfunction-return=thunk-extern)
 endif

 CFLAGS += $(CONDITIONAL_CFLAGS)

Notes:

I am a bit uncertain about the + CONDITIONAL_CFLAGS += $(call TEST_CC_ARG, -fno-asynchronous-unwind-tables) part, but feel free to test.

@gg582
Copy link
Author

gg582 commented Jan 31, 2026

Thanks for the feedback and the suggestions.

It's clear that the current PR needs to be more robust to handle different distribution configs like Ubuntu's CONFIG_OBJTOOL_WERROR. I'll update this PR to a more comprehensive patch that includes the CFLAGS adjustments and supports a wider range of kernel versions, as suggested by @ptr1337 and @Fjodor42.

I'll push the updated commits shortly.

@gg582
Copy link
Author

gg582 commented Jan 31, 2026

PR Summary Update

This PR has been updated to a more comprehensive and universal fix for Linux 6.19 based on community feedback.

Key Changes:

  • API Compatibility: Migrated from page_free to folio_free callbacks and updated zone_device_page_init to match the Linux 6.19 signature.
  • Build Fix for Ubuntu: Added necessary CFLAGS to both nvidia and nvidia-modeset Makefiles to resolve objtool errors on distributions with CONFIG_OBJTOOL_WERROR=y.
  • Improved Robustness: Incorporated technical approaches from the CachyOS kernel patches to ensure better stability across various kernel configurations.

I have verified that these changes allow the module to build successfully on both standard kernels and those with strict security configurations (like Ubuntu's).

@gg582
Copy link
Author

gg582 commented Jan 31, 2026

Hello, I read a certain patch from CachyOS.
I noticed that LINUX_VERSION_CODE >= KERNEL_VERSION(6, 19, 0) is NVIDIA style, so I removed LINUX_VERSION_CODE < KERNEL_VERSION(6, 19, 0). Since I am a monolingual speaker, I used Gemini to translate Summary section to English(I hope you understand, I can read English but I feel painful when I squeeze out my brain and remember English).
Thanks for kind reviewing.
If you have any questions about PR #1015, feel entirely free to ask what can cause system inconsistency. I will try my best to trace bugs.

@gg582
Copy link
Author

gg582 commented Jan 31, 2026

I am testing on -fno-asynchronous-unwind-tables, I will report a build result shortly.

@gg582
Copy link
Author

gg582 commented Jan 31, 2026

Added CFLAGS += -fno-strict-aliasing in src/nvidia/Makefile to match compile flag same as src/nvidia-modeset/Makefile.
This does not always behave clean. It is recommended to derive by each machine's distribution.
Classification is not in a traditional way
Ubuntu/Ubuntu-based, non-Ubuntu based, is not traditional way to classify distributions. Basically Ubuntu is Debian-based distro, but I think there are many 'non-standard' patches inside of Ubuntu.
As a result, a same sort of bugs are never produced in Debian stable/testing.
This should be marked properly, if I understand this problem right.
Applying -fno-asynchronous-unwind-tables as a global option is okay, but when we must trace errors and debug, we cannot see much metadata that should have been generated.

I'd like to request further review if this change makes unexpected behavior.
In my opinion, it is strongly expected that GCC, or Clang would result in similar optimization.

P.S) I didn't run a translator to describe this problem accurately. Its grammar is wrong, but I am sure the meaning is not that wrong. Translator often obfuscate technical correctness.

@gg582
Copy link
Author

gg582 commented Feb 1, 2026

// torvalds/linux commit 12b2285
void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap,
			   unsigned int order);

Recently, zone_device_page_init was changed.
NVIDIA module should call zone_device_page_init(page, 0, 0).
Build succeed on kernel revision 12b2285.

@gg582
Copy link
Author

gg582 commented Feb 8, 2026

Sadly, NVIDIA seems that they want to keep this repository exclusively for reading. The last pull request merge was four years ago, and it seems to be only a minimal effort to match the Linux mainline kernel's release cycle. I don't expect that they'll merge this, but I just hope that they may internally test and patch :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants