UPSTREAM PR #1307: reset weight adapter for models if no loras in request (fix 'sticky loras')#72
UPSTREAM PR #1307: reset weight adapter for models if no loras in request (fix 'sticky loras')#72
Conversation
OverviewAnalysis of commit 659c150 ("reset weight adapter for models if no loras in request") across two binaries shows minimal performance impact. Of 49,737 total functions, 45 were modified (0.09%), with no new or removed functions. Binaries analyzed:
Both power consumption changes are negligible, indicating no meaningful energy impact. Function Analysisapply_loras_at_runtime (both binaries) received the intentional code change, adding four null pointer assignments to reset weight adapters and fix "sticky LoRAs" bug:
The 160-166ns throughput increase directly corresponds to the four added pointer assignments. This overhead is negligible within the function's 30.8ms execution time and represents less than 0.001% of typical multi-second inference workloads. The correctness improvement (preventing incorrect model outputs from persistent LoRA state) fully justifies this minimal cost. Standard library functions show compiler-driven variations without source changes:
GGML functions show minor regressions likely from indirect effects:
Other analyzed functions saw negligible changes, primarily reflecting compiler optimization variance rather than algorithmic modifications. Additional FindingsThe modified function (apply_loras_at_runtime) manages LoRA (Low-Rank Adaptation) application for ML model customization. The fix ensures clean model state between inference requests, preventing artifacts where previous adaptations incorrectly influenced subsequent generations. No GPU functions were directly modified. The changes do not affect the inference critical path (tensor operations, diffusion loop), which operates at millisecond to second timescales, making the nanosecond-level changes immaterial to overall performance. 🔎 Full breakdown: Loci Inspector |
44ec1be to
682032b
Compare
Note
Source pull request: leejet/stable-diffusion.cpp#1307
Currently, weight_adapter remains unchanged if there are no loras in the query.
Therefore, after a calculation with a given loras, all subsequent queries without a loras specified will use the last specified loras.