Skip to content

⚡ perf: reduce lock contention in storage-backed middleware#4373

Closed
Copilot wants to merge 10 commits into
mainfrom
copilot/feat-eliminate-global-mutex-contention
Closed

⚡ perf: reduce lock contention in storage-backed middleware#4373
Copilot wants to merge 10 commits into
mainfrom
copilot/feat-eliminate-global-mutex-contention

Conversation

Copilot AI commented May 28, 2026

Copy link
Copy Markdown
Contributor

Description

Reduces lock contention in storage-backed middleware by introducing sharded keyed locking and narrowing mutex scope around storage operations.

Changes introduced

  • Adds an internal 128-shard keyed locking primitive (internal/keylock) that serializes work per key while limiting global contention.

  • Updates limiter fixed/sliding windows to serialize storage updates per limiter key instead of using a global mutex.

  • Splits cache heap/accounting locking from per-cache-key coordination and narrows session initialization locking.

  • Fixes keylock release safety: the release closure now captures the specific owner channel and only deletes/closes it if s.owners[key] still references that same channel, preventing a stale release from corrupting a re-acquired lock.

  • Fixes cache heap mutex protection: removeHeapEntry calls after relock() are now wrapped with heapMu.Lock()/heapMu.Unlock() to protect the shared heap and storedBytes from concurrent corruption across different keys.

  • Fixes data race in limiter test storage: adds sync.Mutex to failingLimiterStorage to protect concurrent map access, which was exposed by per-key locking allowing parallel storage calls for different keys.

  • Fixes struct fieldalignment in keylock.go and limiter_test.go per govet/betteralign.

  • Fixes ineffectual assignment in blockingLimiterStorage.SetWithContext.

  • Improves test coverage for middleware/cache/cache.go and middleware/limiter/limiter_sliding.go:

    • limiter_sliding.go: rotateWindow 75%→100%, bucketForOriginalHit 40%→100%, ttlDuration 66.7%→100%, secondsToDuration 75%→100%, New handler 84.1%→96.8%. Added unit tests for all helper functions (fresh entry, full/partial expiration, all bucket return paths, overflow cases), sliding storage get/set error paths, Next callback bypass, and post-handler error paths.
    • cache.go: secondsToTime 80%→100%, secondsToDuration 75%→100%, storeVaryManifest 75%→100%, loadVaryManifest 80%→100%, canonicalHeaderSubset 91.7%→100%, canonicalCookieSubset 91.7%→100%, makeHashAuthFunc 81.2%→100%. Added tests for negative expiration, oversized key buffers, eviction delete errors, vary manifest edge cases, pool fallback paths, and query/header/cookie canonicalization.
  • Benchmarks: Limiter benchmarks may show improvement under high concurrency with distinct keys due to reduced global lock contention. The Benchmark_SetLogger regression flagged by CI is unrelated (in log package). The Benchmark_Limiter difference is expected under single-key patterns since per-key locking adds a channel allocation vs. a simple mutex; the benefit is under high concurrency with distinct keys.

  • Documentation Update: No documentation changes required.

  • Changelog/What's New: Reduced lock contention in cache, limiter, and session middleware via sharded per-key locking.

  • Migration Guide: No migration needed; changes are internal.

  • API Alignment with Express: N/A

  • API Longevity: Internal-only changes with no public API impact.

  • Examples: N/A

Type of change

  • Performance improvement (non-breaking change which improves efficiency)
  • Code consistency (non-breaking change which improves code reliability and robustness)

Checklist

  • Conducted a self-review of the code and provided comments for complex or critical parts.
  • Added or updated unit tests to validate the effectiveness of the changes or new features.
  • Ensured that new and existing unit tests pass locally with the changes.
  • Verified that any new dependencies are essential and have been agreed upon by the maintainers/community.
  • Aimed for optimal performance with minimal allocations in the new code.

Commit formatting

Please use emojis in commit messages for an easy way to identify the purpose or intention of a commit. Check out the emoji cheatsheet here: CONTRIBUTING.md

Fixes #4358

Copilot AI changed the title [WIP] Eliminate global mutex contention across storage-backed middleware Reduce storage-backed middleware lock contention with 128-shard keyed locking May 28, 2026
Copilot finished work on behalf of gaby May 28, 2026 04:59
Copilot AI requested a review from gaby May 28, 2026 04:59
@ReneWerner87 ReneWerner87 added this to v3 May 28, 2026
@ReneWerner87 ReneWerner87 added this to the v3 milestone May 28, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite Current: b0023b9 Previous: 3ceebd3 Ratio
Benchmark_Limiter_Custom_Store (github.com/gofiber/fiber/v3/middleware/limiter) - B/op 335 B/op 175 B/op 1.91
Benchmark_Limiter (github.com/gofiber/fiber/v3/middleware/limiter) - B/op 176 B/op 16 B/op 11
Benchmark_Limiter (github.com/gofiber/fiber/v3/middleware/limiter) - allocs/op 4 allocs/op 2 allocs/op 2

This comment was automatically generated by workflow using github-action-benchmark.

@gaby gaby changed the title Reduce storage-backed middleware lock contention with 128-shard keyed locking ⚡ perf: Reduce storage-backed middleware lock contention with 128-shard keyed locking May 28, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces lock contention in storage-backed middleware by introducing sharded keyed locking and narrowing mutex scope around storage operations.

Changes:

  • Adds an internal 128-shard keyed locking primitive.
  • Updates limiter fixed/sliding windows to serialize storage updates per limiter key.
  • Splits cache heap/accounting locking from per-cache-key coordination and narrows session initialization locking.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/keylock/keylock.go Adds shared keyed lock implementation.
internal/keylock/keylock_test.go Tests same-key blocking and different-key non-blocking behavior.
middleware/limiter/limiter_fixed.go Replaces global limiter mutex with keyed locking.
middleware/limiter/limiter_sliding.go Applies keyed locking to sliding-window limiter updates.
middleware/limiter/limiter_test.go Adds concurrency tests for different limiter keys.
middleware/cache/cache.go Splits cache key coordination from heap/accounting locking.
middleware/session/middleware.go Moves session store loading outside middleware mutex.
middleware/session/middleware_test.go Adds coverage for narrowed session initialization locking.

Comment thread internal/keylock/keylock.go Outdated
Comment thread middleware/cache/cache.go
- keylock: capture owner channel in release closure and only delete/close
  if s.owners[key] still points to the same channel, preventing stale
  release from corrupting a re-acquired lock.
- cache: wrap removeHeapEntry calls after relock() with heapMu to protect
  shared heap and storedBytes from concurrent corruption across keys.
Copilot AI changed the title ⚡ perf: Reduce storage-backed middleware lock contention with 128-shard keyed locking perf: reduce lock contention in storage-backed middleware May 28, 2026
Copilot finished work on behalf of gaby May 28, 2026 15:20
@ReneWerner87

Copy link
Copy Markdown
Member

@copilot Three checks are red:

Linter (4 findings):

  • internal/keylock/keylock.go:9:13 and :14:12 - fieldalignment (govet): reorder the struct fields so the 16 pointer bytes pack into 8.
  • middleware/limiter/limiter_test.go:73:29 - fieldalignment (govet).
  • middleware/limiter/limiter_test.go:151:2 - ineffectual assignment to ch (ineffassign): remove the dead assignment.

Test (unit 1.26.x, ubuntu-latest): the race detector reports WARNING: DATA RACE. The per-key lock / cache heap refactor has introduced a data race. Please run go test -race ./middleware/cache/... ./middleware/limiter/... ./internal/keylock/..., find the unsynchronized access (most likely the interaction between the per-key lock and heapMu, or a field touched outside both), and fix it. This is the blocking correctness issue, do this first.

Benchmark (Compare): the regression gate (fail-on-alert) tripped. Since this refactor changes locking hot paths, please confirm whether it is a real regression or comparison noise before merge.

…ssign

- Add mutex to failingLimiterStorage to protect concurrent map access
  (exposed by per-key locking allowing parallel storage calls)
- Fix fieldalignment in keylock.go and limiter_test.go structs
- Fix ineffassign: use _, ok instead of ch, ok when ch is unused

Copilot AI commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

@copilot Three checks are red:

Linter (4 findings):...

All three issues are fixed in commit f5aee2c:

Data race: The failingLimiterStorage test helper had unprotected map access. With per-key locking (instead of a global mutex), different keys now hit storage concurrently, exposing this. Added a sync.Mutex to failingLimiterStorage to protect data and errs maps.

Linter:

  • Fieldalignment: fixed via make betteralign in both keylock.go and limiter_test.go.
  • Ineffassign: changed ch, ok := to _, ok := since ch was unused when the key already existed.

Benchmark: The Benchmark_SetLogger regression (1.52×) is unrelated to this PR — it's in the log package which this PR doesn't touch. The Benchmark_Limiter regression is expected under the benchmark's single-key pattern since per-key locking adds a channel allocation per request vs. a simple mutex; the benefit is under high concurrency with distinct keys, which the existing benchmark doesn't exercise.

Copilot AI requested a review from ReneWerner87 May 29, 2026 14:19
@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.33333% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.83%. Comparing base (3ceebd3) to head (2a96a1f).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
middleware/cache/cache.go 75.00% 7 Missing ⚠️
middleware/limiter/limiter_fixed.go 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4373      +/-   ##
==========================================
+ Coverage   92.70%   92.83%   +0.13%     
==========================================
  Files         138      138              
  Lines       13486    13495       +9     
==========================================
+ Hits        12502    12528      +26     
+ Misses        614      604      -10     
+ Partials      370      363       -7     
Flag Coverage Δ
unittests 92.83% <83.33%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gaby gaby changed the title perf: reduce lock contention in storage-backed middleware ⚡ perf: reduce lock contention in storage-backed middleware Jun 5, 2026
@gaby

gaby commented Jun 5, 2026

Copy link
Copy Markdown
Member

@copilot The test coverage for the following files requires improvement:

  • middleware/cache/cache.go
  • middleware/limiter/limiter_sliding.go

Copilot AI commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@copilot The test coverage for the following files requires improvement:

  • middleware/cache/cache.go
  • middleware/limiter/limiter_sliding.go
    ...

Improved test coverage for both files in the latest commit:

middleware/limiter/limiter_sliding.go (overall: 78.9% → 84.5%):

  • rotateWindow: 75% → 100% (added tests for fresh entry, full expiration, partial window rotation)
  • bucketForOriginalHit: 40% → 100% (all 3 return paths: current window, previous window, expired)
  • ttlDuration: 66.7% → 100% (overflow cases for reset, expiration, and sum)
  • secondsToDuration: 75% → 100% (overflow case)
  • New handler: 84.1% → 96.8% (storage get/set errors, Next callback, post-handler error paths)

middleware/cache/cache.go (overall: 73.8% → 74.5%):

  • secondsToTime: 80% → 100%
  • secondsToDuration: 75% → 100%
  • storeVaryManifest: 75% → 100%
  • loadVaryManifest: 80% → 100%
  • canonicalHeaderSubset: 91.7% → 100%
  • canonicalCookieSubset: 91.7% → 100%
  • makeHashAuthFunc: 81.2% → 100%
  • Added tests for negative expiration, oversized key buffers, eviction delete errors, query/header/cookie edge cases

Copilot finished work on behalf of gaby June 5, 2026 04:19
claude added 2 commits June 27, 2026 12:29
Merge latest main into the lock-contention branch and reconcile the
diverged cache/session refactors:

- cache: keep per-key keylock + heapMu accounting while adopting main's
  cfg.now() timestamp model; drop the obsolete atomic timestamp ticker
  and the now-unused timestampUpdatePeriod constant.
- session: combine main's error-returning initialize signature with the
  PR's narrowed locking so storage I/O (getSession) runs outside m.mu.
- tests: keep both sides' added tests; remove cache coverage tests for
  the canonicalQueryString/Header/CookieSubset helpers that main replaced
  with buildVaryKey; check the new initialize error in the lock test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GBx3vGQUUm6Mg1XBYmfMnS
Make the per-key lock shard count adaptive instead of a fixed 128. The
shard mutex only guards brief map bookkeeping and at most GOMAXPROCS
goroutines contend for it at once, so the effective count is now
max(minShards, 4*GOMAXPROCS) rounded up to a power of two. The 128
constants in cache and limiter become a floor (minShards).

Rounding to a power of two lets Lock select a shard with a bitmask
instead of a modulo. Adds unit tests for nextPow2 and shardCount.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GBx3vGQUUm6Mg1XBYmfMnS
@gaby gaby marked this pull request as ready for review June 27, 2026 13:12
@gaby gaby requested a review from a team as a code owner June 27, 2026 13:12
@gaby gaby requested review from efectn and sixcolors June 27, 2026 13:12
@ReneWerner87

Copy link
Copy Markdown
Member

Thanks for the work here. After review we're going to close this rather than merge, and split the underlying issue (#4358) into separately-scoped fixes. Rationale:

Locking approach

  • The per-key channel locker (internal/keylock) allocates per acquisition on the hot path. The CI benchmark shows Benchmark_Limiter going from 16 B/op, 2 allocs to 176 B/op, 4 allocs (11x bytes), regressing every limiter request to benefit only the high-concurrency-distinct-key case.
  • For the limiter the lock is still held across storage I/O - correctly, since get -> check -> set must be atomic per key - so it doesn't follow 🔥 feat: eliminate global mutex contention across storage-backed middleware (limiter, cache, session) #4358's "never hold a lock across I/O" guidance. That guidance can't apply to the limiter without breaking counter atomicity (lost increments -> limit bypass). The right fix for the limiter is an atomic storage operation (an optional Increment path on the storage interface with a fallback), not a faster in-process lock - which in a multi-instance Redis deployment provides no cross-instance atomicity anyway.
  • The PR also bundles fixes (test-storage data race, heap-mutex protection) that exist only because the per-key change introduced new concurrent access - a lot of subtle concurrency surface for an uncertain win.

Worth keeping

  • The added test coverage for cache.go and limiter_sliding.go helpers is valuable and independent of the locking change. Resubmitted as a focused tests-only PR, we'll take it. (Leave out the failingLimiterStorage mutex - it's only needed because of the per-key change.)

Plan for #4358 - re-scope into three efforts, since the three middleware differ:

  1. Limiter - atomic storage operation (interface extension + fallback).
  2. Cache - stop holding the lock across storage I/O on the read path.
  3. Session - same for Save/Delete I/O.

Closing for now; happy to revisit the coverage tests as their own PR.

@github-project-automation github-project-automation Bot moved this to Done in v3 Jun 27, 2026
@gaby gaby deleted the copilot/feat-eliminate-global-mutex-contention branch June 28, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

🔥 feat: eliminate global mutex contention across storage-backed middleware (limiter, cache, session)

5 participants