-
Notifications
You must be signed in to change notification settings - Fork 537
Description
Git commit
Operating System & Version
Arch Linux, kernel 6.17.7
GGML backends
Vulkan
Command-line arguments used
/home/linuxuser/StableDiffusion/stable-diffusion.cpp/build_vulkan/bin/sd-cli --diffusion-model /home/linuxuser/StableDiffusion/models/flux/flux1-dev-q8_0.gguf --vae /home/linuxuser/StableDiffusion/models/flux/vae/diffusion_pytorch_model.safetensors --clip_l /home/linuxuser/StableDiffusion/models/flux/clip_l.safetensors --t5xxl /home/linuxuser/StableDiffusion/models/flux/t5xxl_fp16.safetensors --lora-model-dir /home/linuxuser/StableDiffusion/models/flux/lora --sampling-method euler --cfg-scale 1.0 --height 1024 --width 1024 --steps 50 -s 9078351 -p "prompt goes here lora:lora:1" -o /home/linuxuser/StableDiffusion/sdcpp_py/Output/20260222_1439.jpg
Steps to reproduce
Running inference with the flux model plus lora. Command is above. The lora is innocent, results are identical without it.
What you expected to happen
This is a StrixHalo setup with 96GB available for Vulkan (of out total 128GB). There is more than enough memory. The inference runs fine for 50 steps, and just fails at the end.
nvtop shows low memory utilization and plenty of usable memory. (see screenshot attached)
Log output is also attached.
The HIP version works with the same parameters. However HIP/ROCm for the Strix Halo is very unstable atm, so using Vulkan is more preferable. I use llama.cpp with Vulkan without any issues, with performance very close to the HIP version (but Vulkan is more stable than HIP).
What actually happened
Full log attached below, this is the failing bit. Looks like its failing in the final vae part.
[INFO ] stable-diffusion.cpp:3445 - generating 1 latent images completed, taking 489.92s
[INFO ] stable-diffusion.cpp:3448 - decoding 1 latents
ggml_vulkan: Device memory allocation of size 4831838208 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:84 - ggml_gallocr_reserve_n_impl: failed to allocate Vulkan0 buffer of size 8545370120
[ERROR] ggml_extend.hpp:1755 - vae: failed to allocate the compute buffer
[ERROR] ggml_extend.hpp:2027 - vae alloc compute buffer failed
[ERROR] stable-diffusion.cpp:2685 - Failed to decode latetnts
Logs / error messages / stack trace
Additional context / environment details
System:
Host: strixhalo Kernel: 6.17.7-arch1-2 arch: x86_64 bits: 64
Console: pty pts/3 Distro: Arch Linux
CPU:
Info: 16-core model: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S bits: 64
type: MT MCP cache: L2: 16 MiB
Speed (MHz): avg: 2000 min/max: 625/5188 cores: 1: 2000 2: 2000 3: 2000
4: 2000 5: 2000 6: 2000 7: 2000 8: 2000 9: 2000 10: 2000 11: 2000 12: 2000
13: 2000 14: 2000 15: 2000 16: 2000 17: 2000 18: 2000 19: 2000 20: 2000
21: 2000 22: 2000 23: 2000 24: 2000 25: 2000 26: 2000 27: 2000 28: 2000
29: 2000 30: 2000 31: 2000 32: 2000
Graphics:
Device-1: Advanced Micro Devices [AMD/ATI] Strix Halo [Radeon Graphics /
Radeon 8050S 8060S Graphics] driver: amdgpu v: kernel
Display: unspecified server: X.org v: 1.21.1.21 with: Xwayland v: 24.1.9
driver: X: loaded: amdgpu unloaded: modesetting dri: radeonsi gpu: amdgpu
tty: 80x24 resolution: 1680x1050
API: EGL v: 1.5 drivers: kms_swrast,radeonsi,swrast
platforms: gbm,surfaceless,device
API: OpenGL v: 4.6 compat-v: 4.5 vendor: mesa v: 25.3.5-arch1.1
note: console (EGL sourced) renderer: llvmpipe (LLVM 21.1.6 256 bits),
Radeon 8060S Graphics (radeonsi gfx1151 LLVM 21.1.6 DRM 3.64
6.17.7-arch1-2)
API: Vulkan v: 1.4.341 drivers: radv surfaces: N/A
Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
de: kscreen-console,kscreen-doctor gpu: amd-smi wl: wayland-info
x11: xdpyinfo,xprop