URL: https://docs.radxa.com/orion/o6/app-development/artificial-intelligence/llama-cpp
Time: 4/17/2026, 1:04:50 PM
Bug report for cix-llama-cpp (affects 1.2.4 and 1.2.6):
Title: llama-server-vulkan produces empty content in chat completions after the first request (KV cache corruption on Gemma 3 / Vulkan backend)
Reproduction:
/usr/share/cix/bin/llama-server-vulkan
--model
--device Vulkan0 --n-gpu-layers 99
Send 3 sequential POST /v1/chat/completions requests. First returns '\n', second and third return "". Token count is non-zero but text is always empty.
Root cause observed: The binary forces n_parallel=4 and kv_unified=true regardless of --parallel 1. The KV cache state after the first inference corrupts subsequent decode outputs on the Vulkan backend. On CPU (--device none) this does not occur.
Secondary bug: --jinja flag causes the server to hang indefinitely on the first request (no response, connection accepted but never answered).
Workaround: None currently available that preserves Vulkan acceleration.
URL: https://docs.radxa.com/orion/o6/app-development/artificial-intelligence/llama-cpp
Time: 4/17/2026, 1:04:50 PM
Bug report for cix-llama-cpp (affects 1.2.4 and 1.2.6):
Title: llama-server-vulkan produces empty content in chat completions after the first request (KV cache corruption on Gemma 3 / Vulkan backend)
Reproduction:
/usr/share/cix/bin/llama-server-vulkan
--model
--device Vulkan0 --n-gpu-layers 99
Send 3 sequential POST /v1/chat/completions requests. First returns '\n', second and third return "". Token count is non-zero but text is always empty.
Root cause observed: The binary forces n_parallel=4 and kv_unified=true regardless of --parallel 1. The KV cache state after the first inference corrupts subsequent decode outputs on the Vulkan backend. On CPU (--device none) this does not occur.
Secondary bug: --jinja flag causes the server to hang indefinitely on the first request (no response, connection accepted but never answered).
Workaround: None currently available that preserves Vulkan acceleration.