[Bug] Vulkan runs out of memory when it shouldn't

### Git commit

c9cd49701a91cb10a07f07512d1bb5b966f05c6e

### Operating System & Version

Arch Linux, kernel 6.17.7

### GGML backends

Vulkan

### Command-line arguments used

/home/linuxuser/StableDiffusion/stable-diffusion.cpp/build_vulkan/bin/sd-cli --diffusion-model /home/linuxuser/StableDiffusion/models/flux/flux1-dev-q8_0.gguf --vae /home/linuxuser/StableDiffusion/models/flux/vae/diffusion_pytorch_model.safetensors --clip_l /home/linuxuser/StableDiffusion/models/flux/clip_l.safetensors --t5xxl /home/linuxuser/StableDiffusion/models/flux/t5xxl_fp16.safetensors --lora-model-dir /home/linuxuser/StableDiffusion/models/flux/lora --sampling-method euler --cfg-scale 1.0 --height 1024 --width 1024 --steps 50 -s 9078351 -p "prompt goes here <lora:lora:1>" -o /home/linuxuser/StableDiffusion/sdcpp_py/Output/20260222_1439.jpg

### Steps to reproduce

Running inference with the flux model plus lora.  Command is above.  The lora is innocent, results are identical without it.

### What you expected to happen

This is a StrixHalo setup with 96GB available for Vulkan (of out total 128GB).  There is more than enough memory.  The inference runs fine for 50 steps, and just fails at the end.

nvtop shows low memory utilization and plenty of usable memory. (see screenshot attached)

Log output is also attached.

The HIP version works with the same parameters.  However HIP/ROCm for the Strix Halo is very unstable atm, so using Vulkan is more preferable.  I use llama.cpp with Vulkan without any issues, with performance very close to the HIP version (but Vulkan is more stable than HIP).

### What actually happened

Full log attached below, this is the failing bit. Looks like its failing in the final vae part.

[INFO ] stable-diffusion.cpp:3445 - generating 1 latent images completed, taking 489.92s
[INFO ] stable-diffusion.cpp:3448 - decoding 1 latents
ggml_vulkan: Device memory allocation of size 4831838208 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:84   - ggml_gallocr_reserve_n_impl: failed to allocate Vulkan0 buffer of size 8545370120
[ERROR] ggml_extend.hpp:1755 - vae: failed to allocate the compute buffer
[ERROR] ggml_extend.hpp:2027 - vae alloc compute buffer failed
[ERROR] stable-diffusion.cpp:2685 - Failed to decode latetnts

### Logs / error messages / stack trace

[sdcpp_log.txt](https://github.com/user-attachments/files/25468895/sdcpp_log.txt)

<img width="889" height="623" alt="Image" src="https://github.com/user-attachments/assets/834e8a2a-44ed-44d1-967f-91c075d12aed" />

### Additional context / environment details

System:
  Host: strixhalo Kernel: 6.17.7-arch1-2 arch: x86_64 bits: 64
  Console: pty pts/3 Distro: Arch Linux
CPU:
  Info: 16-core model: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S bits: 64
    type: MT MCP cache: L2: 16 MiB
  Speed (MHz): avg: 2000 min/max: 625/5188 cores: 1: 2000 2: 2000 3: 2000
    4: 2000 5: 2000 6: 2000 7: 2000 8: 2000 9: 2000 10: 2000 11: 2000 12: 2000
    13: 2000 14: 2000 15: 2000 16: 2000 17: 2000 18: 2000 19: 2000 20: 2000
    21: 2000 22: 2000 23: 2000 24: 2000 25: 2000 26: 2000 27: 2000 28: 2000
    29: 2000 30: 2000 31: 2000 32: 2000
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Strix Halo [Radeon Graphics /
    Radeon 8050S 8060S Graphics] driver: amdgpu v: kernel
  Display: unspecified server: X.org v: 1.21.1.21 with: Xwayland v: 24.1.9
    driver: X: loaded: amdgpu unloaded: modesetting dri: radeonsi gpu: amdgpu
    tty: 80x24 resolution: 1680x1050
  API: EGL v: 1.5 drivers: kms_swrast,radeonsi,swrast
    platforms: gbm,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: mesa v: 25.3.5-arch1.1
    note: console (EGL sourced) renderer: llvmpipe (LLVM 21.1.6 256 bits),
    Radeon 8060S Graphics (radeonsi gfx1151 LLVM 21.1.6 DRM 3.64
    6.17.7-arch1-2)
  API: Vulkan v: 1.4.341 drivers: radv surfaces: N/A
  Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
    de: kscreen-console,kscreen-doctor gpu: amd-smi wl: wayland-info
    x11: xdpyinfo,xprop


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Vulkan runs out of memory when it shouldn't #1290

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Vulkan runs out of memory when it shouldn't #1290

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions