-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
NVIDIA Open GPU Kernel Modules Version
590.48.01
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Amazon Linux 2023
Kernel Release
I've tested multiple custom built > 6.16 kernels
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
NVIDIA H100 80GB HBM3
Describe the bug
After:
commit b7e2823787735ca009e63f35f164b46df0ef096c
Author: Alistair Popple <[apopple@nvidia.com](mailto:apopple@nvidia.com)>
Date: Fri Feb 28 14:31:05 2025 +1100
mm/mm_init: move p2pdma page refcount initialisation to p2pdma
p2pdma pages are not being refcounted correctly. This causes CUDA to incorrectly conclude that p2pdma is not supported. Applying a change like:
diff --git a/kernel-open/nvidia-uvm/uvm_pmm_gpu.c b/kernel-open/nvidia-uvm/uvm_pmm_gpu.c
index 97ff13dc..9585ad0d 100644
--- a/kernel-open/nvidia-uvm/uvm_pmm_gpu.c
+++ b/kernel-open/nvidia-uvm/uvm_pmm_gpu.c
@@ -3352,8 +3352,10 @@ void uvm_pmm_gpu_device_p2p_init(uvm_parent_gpu_t *parent_gpu)
// allocate PCI P2PDMA pages directly
p2p_page = pfn_to_page(pci_start_pfn);
page_pgmap(p2p_page)->ops = &uvm_device_p2p_pgmap_ops;
- for (; page_to_pfn(p2p_page) < pci_end_pfn; p2p_page++)
+ for (; page_to_pfn(p2p_page) < pci_end_pfn; p2p_page++) {
p2p_page->zone_device_data = NULL;
+ set_page_count(p2p_page, 1);
+ }
parent_gpu->device_p2p_initialised = true;
}
appears to fix the issue.
To Reproduce
CUFILE_USE_PCIP2PDMA=1 /usr/local/cuda/gds/tools/gdscheck -p will fail to return p2pdma as supported.
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working