UPSTREAM PR #1316: fix: sd-server memory leak by loci-dev · Pull Request #75 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-03-04T04:57:41Z

Note

Source pull request: leejet/stable-diffusion.cpp#1316

free results (sd_images_t array) from generate_images() after write_image_to_vector()

loci-review · 2026-03-04T05:55:04Z

Overview

Analysis of stable-diffusion.cpp compared 49,758 functions across two versions, identifying 56 modified functions, 1 new function, and 49,701 unchanged. The target version introduces a memory leak fix in sd-server (commit 85676b7) with minimal performance impact.

Binaries analyzed:

build.bin.sd-server: -0.028% power consumption (527,129.70 nJ → 526,980.49 nJ)
build.bin.sd-cli: 0.0% power consumption change (491,105.58 nJ → 491,105.69 nJ)

Overall impact is positive, with hot-path optimizations offsetting minor regressions in non-critical functions.

Function Analysis

Performance-critical improvements:

ggml_compute_forward_map_custom3: Response time decreased 32.85% (-76.86ns: 233.99ns → 157.13ns); throughput time decreased 35.05% (-76.85ns: 219.25ns → 142.40ns). This GGML tensor operation is called thousands of times per inference, compounding to meaningful savings.
GGMLRunner::copy_data_to_backend_tensor: Response time decreased 11.45% (-197.67ns: 1,725.58ns → 1,527.91ns); throughput time decreased 56.89% (-197.69ns: 347.49ns → 149.80ns). Critical for GPU inference, reducing data transfer overhead between host and backend memory.
std::vector<sd_lora_t>::end: Response time decreased 69.44% (-183.29ns: 263.94ns → 80.65ns); throughput time decreased 75.41% (-183.29ns: 243.07ns → 59.78ns). Benefits LoRA configuration processing during request handling.

Non-critical regressions:

std::vector::begin and std::vector::begin: Both show ~289% throughput increases (+180.81ns: 62.49ns → 243.30ns), but absolute impact is negligible for these STL accessors used during initialization.
nlohmann::json::lexer::scan_string: Response time increased 6.01% (+2,530ns: 42,105.09ns → 44,635.09ns) with stable throughput time (-0.42%). Affects request parsing at boundary, not inference hot path.

Other analyzed functions showed minor variations in standard library operations with negligible real-world impact.

Additional Findings

The memory leak fix (free_results() function added at three cleanup sites) prevents unbounded memory growth in long-running servers without measurable energy overhead. Indirect benefits include reduced heap fragmentation improving allocator performance, evident in the 50% throughput improvement for std::_Construct string operations. The combined hot-path optimizations provide approximately 0.79ms savings per image generation, while the memory stability improvements are essential for production deployments handling sustained workloads.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Korsar13 added 2 commits March 4, 2026 09:27

fix: sd-server memory leak

85676b7

mistype

9fc155b

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 4, 2026 04:57 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 44ec1be to 682032b Compare March 6, 2026 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1316: fix: sd-server memory leak#75

UPSTREAM PR #1316: fix: sd-server memory leak#75
loci-dev wants to merge 2 commits intomainfrom
loci/pr-1316-master

loci-dev commented Mar 4, 2026

Uh oh!

loci-review bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Mar 4, 2026

Uh oh!

loci-review bot commented Mar 4, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants