Skip to content

Xid (PCI:0000:2b:00): 31 MMU Fault: ENGINE GRAPHICS GPC6 GPCCLIENT_T1_13 faulted @ 0x6f17_04027000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_WRITE #1030

@aginies

Description

@aginies

NVIDIA Open GPU Kernel Modules Version

590.48.01

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

openSUSE Tumbleweed

Kernel Release

6.18.9-1-default

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 5090

Describe the bug

There is an issue with the linux kernel driver. Can't run ollama or llama.ccp without getting an error, and this is working fine under windows, so seems this is not an Hardware or a firmware issue.

llama.ccp

./llama-bench -m ~/testcuda/qwen2.5-coder-3b-instruct-q4_0.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes

model size params backend ngl test t/s
RMS_NORM: src0_d=0x7f6f32000000 (attr_err=0, type=2), dst_d=0x7f6f32501000 (attr_err=0, type=2)
ne00=2048, ne01=512, ne02=1, ne03=1, s01=2048, s02=1048576, s03=1048576
MUL_OP: src0=0x7f6f32501000 (err=0, type=2), src1=0x7f6e7136f800 (err=0, type=2), dst=0x7f6f32501000 (err=0, type=2)
src0: type=f32, shape=[2048,512,1,1]
src1: type=f32, shape=[2048,1,1,1]
dst: type=f32, shape=[2048,512,1,1]

!!! CUDA KERNEL FAILED: op=MUL (6)
src0: f32, shape=[2048,512,1,1]
dst: f32, shape=[2048,512,1,1]

=== CUDA ERROR DETAILS ===
CUDA error: an illegal memory access was encountered
current device: 0
function: ggml_cuda_compute_forward
location: /home/aginies/open-gpu-kernel-modules/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2332
statement: err

With ollama

ollama serve
level=INFO source=types.go:42 msg="inference compute" id=GPU-446f513c-0699-5337-2cd0-9fa3d507cc94 filter_id="" library=CUDA compute=12.0 name=CUDA0 description="NVIDIA GeForce RTX 5090" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:2b:00.0 type=discrete total="31.8 GiB" available="31.3 GiB"

ollama run llama3.2
time=2026-02-18T18:39:02.879+01:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
⠦ time=2026-02-18T18:39:03.117+01:00 level=INFO source=sched.go:493 msg="Load failed" model=/home/aginies/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff error="llama runner process has terminated: CUDA error: an illegal memory access was encountered\n current device: 0, in function ggml_backend_cuda_buffer_clear at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:790\n cudaStreamSynchronize(((cudaStream_t)0x2))\n//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: CUDA error"
[GIN] 2026/02/18 - 18:39:03 | 500 | 4.656141722s | 127.0.0.1 | POST "/api/generate"
Error: 500 Internal Server Error: llama runner process has terminated: CUDA error: an illegal memory access was encountered
current device: 0, in function ggml_backend_cuda_buffer_clear at //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:790
cudaStreamSynchronize(((cudaStream_t)0x2))
//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: CUDA error

dmesg

[ 130.456250] [ T1946] NVRM: Xid (PCI:0000:2b:00): 31, pid=1942, name=llama-bench, channel 0x00000002, intr 00000000. MMU Fault: ENGINE GRAPHICS GPC6 GPCCLIENT_T1_13 faulted @ 0x6f6f_320c2000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
[ 130.467839] [ T1964] llama-bench[1964]: segfault at 206e03fe0 ip 00007f71b508324c sp 00007ffe8b911a10 error 4 in libcuda.so.590.48.01[48324c,7f71b4d67000+108c000] likely on CPU 11 (core 11, socket 0)
[ 130.467844] [ T1964] Code: ce ff 83 3d 81 03 9b 05 01 49 8b 1c 24 76 0e 8b 05 89 03 9b 05 85 c0 0f 84 91 00 00 00 49 8b 44 24 10 41 8b 4c 24 24 48 8b 13 <8b> 00 41 39 c5 0f 84 89 00 00 00 8b b3 40 40 00 00 48 89 f0 89 8c

some testing

Sounds like there is an issue with chunk size on my card:
test_chunk_sizes.cu: https://paste.opensuse.org/pastes/5d419b50384b
I build it with nvcc, and test it:

./test_chunk_sizes
Testing chunk sizes to find failure boundary...
Testing chunk size 512KB (524288 bytes)... PASS
Testing chunk size 576KB (589824 bytes)... PASS
Testing chunk size 600KB (614400 bytes)... PASS
Testing chunk size 620KB (634880 bytes)... PASS
Testing chunk size 630KB (645120 bytes)... PASS
Testing chunk size 635KB (650240 bytes)... PASS
Testing chunk size 639KB (654336 bytes)... PASS
Testing chunk size 640KB (655360 bytes)... cudaDeviceSynchronize failed at offset 0: an illegal memory access was encountered
FAIL
Testing chunk size 641KB (656384 bytes)... cudaMalloc failed: CUDA-capable device(s) is/are busy or unavailable

dmesg

NVRM: Xid (PCI:0000:2b:00): 31, pid=2250, name=test_chunk_size, channel 0x00000002, intr 00000000. MMU Fault: ENGINE GRAPHICS GPC6 GPCCLIENT_T1_13 faulted @ 0x6fb6_c401b000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_WRITE

To Reproduce

ollama or llama.cpp using any LLM will cause the issue.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

reproducible on Leap15.6, archlinux, ubuntu etc...
Tried multiple 5XX.XX Nividia drivers with different kernel version, always same error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions