-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
NVIDIA Open GPU Kernel Modules Version
590.48.01
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
openSUSE Tumbleweed
Kernel Release
6.18.9-1-default
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
GPU 0: NVIDIA GeForce RTX 5090
Describe the bug
There is an issue with the linux kernel driver. Can't run ollama or llama.ccp without getting an error, and this is working fine under windows, so seems this is not an Hardware or a firmware issue.
llama.ccp
./llama-bench -m ~/testcuda/qwen2.5-coder-3b-instruct-q4_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
| model | size | params | backend | ngl | test | t/s |
|---|---|---|---|---|---|---|
| RMS_NORM: src0_d=0x7f6f32000000 (attr_err=0, type=2), dst_d=0x7f6f32501000 (attr_err=0, type=2) | ||||||
| ne00=2048, ne01=512, ne02=1, ne03=1, s01=2048, s02=1048576, s03=1048576 | ||||||
| MUL_OP: src0=0x7f6f32501000 (err=0, type=2), src1=0x7f6e7136f800 (err=0, type=2), dst=0x7f6f32501000 (err=0, type=2) | ||||||
| src0: type=f32, shape=[2048,512,1,1] | ||||||
| src1: type=f32, shape=[2048,1,1,1] | ||||||
| dst: type=f32, shape=[2048,512,1,1] |
!!! CUDA KERNEL FAILED: op=MUL (6)
src0: f32, shape=[2048,512,1,1]
dst: f32, shape=[2048,512,1,1]
=== CUDA ERROR DETAILS ===
CUDA error: an illegal memory access was encountered
current device: 0
function: ggml_cuda_compute_forward
location: /home/aginies/open-gpu-kernel-modules/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2332
statement: err
With ollama
ollama serve
level=INFO source=types.go:42 msg="inference compute" id=GPU-446f513c-0699-5337-2cd0-9fa3d507cc94 filter_id="" library=CUDA compute=12.0 name=CUDA0 description="NVIDIA GeForce RTX 5090" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:2b:00.0 type=discrete total="31.8 GiB" available="31.3 GiB"
ollama run llama3.2
time=2026-02-18T18:39:02.879+01:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
⠦ time=2026-02-18T18:39:03.117+01:00 level=INFO source=sched.go:493 msg="Load failed" model=/home/aginies/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff error="llama runner process has terminated: CUDA error: an illegal memory access was encountered\n current device: 0, in function ggml_backend_cuda_buffer_clear at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:790\n cudaStreamSynchronize(((cudaStream_t)0x2))\n//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: CUDA error"
[GIN] 2026/02/18 - 18:39:03 | 500 | 4.656141722s | 127.0.0.1 | POST "/api/generate"
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: an illegal memory access was encountered
current device: 0, in function ggml_backend_cuda_buffer_clear at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:790
cudaStreamSynchronize(((cudaStream_t)0x2))
//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: CUDA error
dmesg
[ 130.456250] [ T1946] NVRM: Xid (PCI:0000:2b:00): 31, pid=1942, name=llama-bench, channel 0x00000002, intr 00000000. MMU Fault: ENGINE GRAPHICS GPC6 GPCCLIENT_T1_13 faulted @ 0x6f6f_320c2000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[ 130.467839] [ T1964] llama-bench[1964]: segfault at 206e03fe0 ip 00007f71b508324c sp 00007ffe8b911a10 error 4 in libcuda.so.590.48.01[48324c,7f71b4d67000+108c000] likely on CPU 11 (core 11, socket 0)
[ 130.467844] [ T1964] Code: ce ff 83 3d 81 03 9b 05 01 49 8b 1c 24 76 0e 8b 05 89 03 9b 05 85 c0 0f 84 91 00 00 00 49 8b 44 24 10 41 8b 4c 24 24 48 8b 13 <8b> 00 41 39 c5 0f 84 89 00 00 00 8b b3 40 40 00 00 48 89 f0 89 8c
some testing
Sounds like there is an issue with chunk size on my card:
test_chunk_sizes.cu: https://paste.opensuse.org/pastes/5d419b50384b
I build it with nvcc, and test it:
./test_chunk_sizes
Testing chunk sizes to find failure boundary...
Testing chunk size 512KB (524288 bytes)... PASS
Testing chunk size 576KB (589824 bytes)... PASS
Testing chunk size 600KB (614400 bytes)... PASS
Testing chunk size 620KB (634880 bytes)... PASS
Testing chunk size 630KB (645120 bytes)... PASS
Testing chunk size 635KB (650240 bytes)... PASS
Testing chunk size 639KB (654336 bytes)... PASS
Testing chunk size 640KB (655360 bytes)... cudaDeviceSynchronize failed at offset 0: an illegal memory access was encountered
FAIL
Testing chunk size 641KB (656384 bytes)... cudaMalloc failed: CUDA-capable device(s) is/are busy or unavailable
dmesg
NVRM: Xid (PCI:0000:2b:00): 31, pid=2250, name=test_chunk_size, channel 0x00000002, intr 00000000. MMU Fault: ENGINE GRAPHICS GPC6 GPCCLIENT_T1_13 faulted @ 0x6fb6_c401b000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_WRITE
To Reproduce
ollama or llama.cpp using any LLM will cause the issue.
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
reproducible on Leap15.6, archlinux, ubuntu etc...
Tried multiple 5XX.XX Nividia drivers with different kernel version, always same error.