Skip to content

Move thin lock acquire/release in CoreCLR to managed code#129502

Open
VSadov wants to merge 27 commits into
dotnet:mainfrom
VSadov:manThin
Open

Move thin lock acquire/release in CoreCLR to managed code#129502
VSadov wants to merge 27 commits into
dotnet:mainfrom
VSadov:manThin

Conversation

@VSadov

@VSadov VSadov commented Jun 17, 2026

Copy link
Copy Markdown
Member
  • Moves the thin lock acquire/release to managed code.
  • Cross-porting various tweaks between CoreCLR and NativeAOT implementations where applicable. Mostly adjusting CoreCLR code to be like NativeAOT, but a couple of tweaks went the other direction too.
  • Fixes an issue with TryAcquire spinning extensively if a thin lock is owned by somebody else.

The overall effect is 5%-40% improvement in throughput depending on platform and on thin/fat lock scenario.

Copilot AI review requested due to automatic review settings June 17, 2026 05:40
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @VSadov
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves the thin-lock (object header) acquire/release fast paths from CoreCLR native code into managed implementations in System.Private.CoreLib, removing the associated FCALL/ecall surface and native inline helpers.

Changes:

  • Removed native thin-lock helpers (syncblk.inl, ObjHeader::*HeaderThinLock, FCALL entries) and associated includes/build references.
  • Implemented thin-lock acquire/release in managed System.Threading.ObjectHeader (CoreCLR) and updated Monitor to call the new managed entrypoints.
  • Kept NativeAOT parity by renaming/updating its thin-lock entrypoints and adjusting call sites accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/coreclr/vm/syncblk.inl Removes native inline thin-lock acquire/release implementation.
src/coreclr/vm/syncblk.h Removes native HeaderLockResult and thin-lock method declarations from ObjHeader.
src/coreclr/vm/ecalllist.h Drops the ObjectHeader FCALL mapping entries.
src/coreclr/vm/comsynchronizable.h Removes FCDECLs for thin-lock FCALL entrypoints.
src/coreclr/vm/comsynchronizable.cpp Removes FCIMPL implementations for thin-lock FCALL entrypoints.
src/coreclr/vm/common.h Removes syncblk.inl from global inline includes.
src/coreclr/vm/CMakeLists.txt Removes syncblk.inl from VM header lists.
src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Adds managed thin-lock acquire/release logic and exposes AcquireThinLock(...) and managed Release(...).
src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs Routes Monitor.Enter/TryEnter/... to the new managed thin-lock entrypoints.
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/ObjectHeader.cs Renames/reshapes NativeAOT thin-lock entrypoint to AcquireThinLock(...) and adjusts uncommon-path handling.
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Monitor.NativeAot.cs Updates NativeAOT Monitor to call AcquireThinLock(...).

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated
Copilot AI review requested due to automatic review settings June 17, 2026 15:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

return HeaderLockResult.UseSlowPath;
}

if (Interlocked.CompareExchange(pHeader, oldBits | currentThreadID, oldBits) == oldBits)

@VSadov VSadov Jun 17, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing CAS in managed code is one of the motivations. The native CAS needs to check for the presence of LSE on ARM64, JITed code does not need that.

This affects Linux-arm64 perhaps even more than Windows-arm64.

Copilot AI review requested due to automatic review settings June 18, 2026 02:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated
Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated
Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated
@VSadov

VSadov commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

@MihuBot benchmark System.Collections.Concurrent -arm -intel

@VSadov

VSadov commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

@MihuBot benchmark System.Threading -arm

@MihuBot

MihuBot commented Jun 18, 2026

Copy link
Copy Markdown
System.Collections.Concurrent.IsEmpty_String_
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 0 128.152 ns 0.1316 ns 1.00 - NA
Dictionary PR 0 130.493 ns 0.0555 ns 1.02 - NA
Queue Main 0 3.229 ns 0.0026 ns 1.00 - NA
Queue PR 0 3.282 ns 0.0184 ns 1.02 - NA
Stack Main 0 1.505 ns 0.0004 ns 1.00 - NA
Stack PR 0 1.505 ns 0.0007 ns 1.00 - NA
Bag Main 0 8.534 ns 0.0014 ns 1.00 - NA
Bag PR 0 8.587 ns 0.0023 ns 1.01 - NA
Dictionary Main 512 3.116 ns 0.0011 ns 1.00 - NA
Dictionary PR 512 3.154 ns 0.0062 ns 1.01 - NA
Queue Main 512 2.558 ns 0.0006 ns 1.00 - NA
Queue PR 512 2.562 ns 0.0007 ns 1.00 - NA
Stack Main 512 1.504 ns 0.0008 ns 1.00 - NA
Stack PR 512 1.505 ns 0.0004 ns 1.00 - NA
Bag Main 512 8.039 ns 0.0013 ns 1.00 - NA
Bag PR 512 8.064 ns 0.0089 ns 1.00 - NA
System.Collections.Concurrent.IsEmpty_Int32_
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 0 128.425 ns 0.0369 ns 1.00 - NA
Dictionary PR 0 131.646 ns 0.0505 ns 1.03 - NA
Queue Main 0 3.250 ns 0.0022 ns 1.00 - NA
Queue PR 0 3.264 ns 0.0014 ns 1.00 - NA
Stack Main 0 1.500 ns 0.0008 ns 1.00 - NA
Stack PR 0 1.501 ns 0.0003 ns 1.00 - NA
Bag Main 0 4.861 ns 0.0015 ns 1.00 - NA
Bag PR 0 4.872 ns 0.0136 ns 1.00 - NA
Dictionary Main 512 3.159 ns 0.0008 ns 1.00 - NA
Dictionary PR 512 3.154 ns 0.0087 ns 1.00 - NA
Queue Main 512 2.592 ns 0.0091 ns 1.00 - NA
Queue PR 512 2.593 ns 0.0004 ns 1.00 - NA
Stack Main 512 1.502 ns 0.0010 ns 1.00 - NA
Stack PR 512 1.505 ns 0.0012 ns 1.00 - NA
Bag Main 512 4.312 ns 0.0016 ns 1.00 - NA
Bag PR 512 4.334 ns 0.0016 ns 1.00 - NA
System.Collections.Concurrent.Count_String_
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 512 129.939 ns 0.0379 ns 1.00 - NA
Dictionary PR 512 131.634 ns 0.0412 ns 1.01 - NA
Queue Main 512 4.702 ns 0.0046 ns 1.00 - NA
Queue PR 512 4.524 ns 0.0097 ns 0.96 - NA
Queue_EnqueueCountDequeue Main 512 21.786 ns 0.0044 ns 1.00 - NA
Queue_EnqueueCountDequeue PR 512 22.770 ns 0.0188 ns 1.05 - NA
Stack Main 512 610.766 ns 0.0793 ns 1.00 - NA
Stack PR 512 610.898 ns 0.0927 ns 1.00 - NA
Bag Main 512 43.029 ns 0.4188 ns 1.00 - NA
Bag PR 512 40.945 ns 0.2831 ns 0.95 - NA
System.Collections.Concurrent.Count_Int32_
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Size Mean Error Ratio Allocated Alloc Ratio
Dictionary Main 512 139.712 ns 0.1150 ns 1.00 - NA
Dictionary PR 512 131.381 ns 0.0345 ns 0.94 - NA
Queue Main 512 4.366 ns 0.0104 ns 1.00 - NA
Queue PR 512 4.437 ns 0.0013 ns 1.02 - NA
Queue_EnqueueCountDequeue Main 512 23.008 ns 0.0437 ns 1.00 - NA
Queue_EnqueueCountDequeue PR 512 22.597 ns 0.0213 ns 0.98 - NA
Stack Main 512 610.776 ns 0.0428 ns 1.00 - NA
Stack PR 512 610.720 ns 0.0490 ns 1.00 - NA
Bag Main 512 42.047 ns 0.1616 ns 1.00 - NA
Bag PR 512 40.420 ns 0.0097 ns 0.96 - NA

Copilot AI review requested due to automatic review settings June 19, 2026 16:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs:90

  • Monitor.TryEnter(object, int) always falls back to GetLockObject(obj).TryEnter(millisecondsTimeout) after a failed one-shot thin-lock attempt. For millisecondsTimeout == 0, this can unnecessarily allocate/create the Lock (inflation) even though we already know the thin lock is currently owned by another thread. Since TryEnter(…, 0) is a one-shot operation, it can return false immediately on HeaderLockResult.Failure and avoid the slow path/inflation cost.
        public static bool TryEnter(object obj, int millisecondsTimeout)
        {
            ArgumentOutOfRangeException.ThrowIfLessThan(millisecondsTimeout, -1);

            ObjectHeader.HeaderLockResult result = ObjectHeader.AcquireThinLock(obj, isOneShot: true);
            if (result == ObjectHeader.HeaderLockResult.Success)
                return true;

            return GetLockObject(obj).TryEnter(millisecondsTimeout);

@VSadov

VSadov commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

@MihuBot benchmark System.Threading -arm

@VSadov

VSadov commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

@MihuBot benchmark System.Collections.Concurrent -arm

@MihuBot

MihuBot commented Jun 19, 2026

Copy link
Copy Markdown
System.Threading.Tests.Perf_Volatile
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NRQIIJ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NGSIDY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
Write_double Main 2.132 ns 0.0124 ns 1.00 - NA
Write_double PR 2.135 ns 0.0123 ns 1.00 - NA
Read_double Main 3.298 ns 0.0239 ns 1.00 - NA
Read_double PR 3.284 ns 0.0055 ns 1.00 - NA
System.Threading.Tests.Perf_Timer
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-NRQIIJ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NGSIDY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
ShortScheduleAndDispose Main 130.0 ns 1.52 ns 1.00 120 B 1.00
ShortScheduleAndDispose PR 130.3 ns 1.61 ns 1.00 120 B 1.00
LongScheduleAndDispose Main 129.9 ns 1.26 ns 1.00 120 B 1.00
LongScheduleAndDispose PR 131.0 ns 1.04 ns 1.01 120 B 1.00
ScheduleManyThenDisposeMany Main 244,645,956.9 ns 2,605,208.19 ns 1.00 144000000 B 1.00
ScheduleManyThenDisposeMany PR 249,305,648.6 ns 3,958,714.20 ns 1.02 144000000 B 1.00
ShortScheduleAndDisposeWithFiringTimers Main 152.3 ns 4.49 ns 1.00 144 B 1.00
ShortScheduleAndDisposeWithFiringTimers PR 149.8 ns 3.75 ns 0.98 144 B 1.00
SynchronousContention Main 5,404,397,834.9 ns 34,069,649.97 ns 1.00 1152000760 B 1.00
SynchronousContention PR 5,571,240,872.0 ns 33,764,941.19 ns 1.03 1152000760 B 1.00
AsynchronousContention Main 4,851,066,960.9 ns 92,741,774.32 ns 1.00 1344002232 B 1.00
AsynchronousContention PR 4,873,755,580.8 ns 40,387,695.59 ns 1.00 1344002232 B 1.00
System.Threading.Tests.Perf_ThreadStatic
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
GetThreadStatic Main 1.509 ns 0.0003 ns 1.00 - NA
GetThreadStatic PR 1.507 ns 0.0004 ns 1.00 - NA
SetThreadStatic Main 2.515 ns 0.0023 ns 1.00 - NA
SetThreadStatic PR 2.503 ns 0.0034 ns 1.00 - NA
System.Threading.Tests.Perf_ThreadPool
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1  Gen0=9000.0000
Method Toolchain WorkItemsPerCore Mean Error Ratio Allocated Alloc Ratio
QueueUserWorkItem_WaitCallback_Throughput Main 20000000 9.043 s 0.0811 s 1.00 610.35 MB 1.00
QueueUserWorkItem_WaitCallback_Throughput PR 20000000 9.258 s 0.1603 s 1.02 610.35 MB 1.00
System.Threading.Tests.Perf_Thread
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NRQIIJ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NGSIDY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
CurrentThread Main 1.503 ns 0.0002 ns 1.00 - NA
CurrentThread PR 1.502 ns 0.0003 ns 1.00 - NA
GetCurrentProcessorId Main 2.889 ns 0.0135 ns 1.00 - NA
GetCurrentProcessorId PR 2.867 ns 0.0098 ns 0.99 - NA
System.Threading.Tests.Perf_SpinLock
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
EnterExit Main 14.179 ns 0.0047 ns 1.00 - NA
EnterExit PR 15.610 ns 0.0049 ns 1.10 - NA
TryEnterExit Main 15.625 ns 0.0044 ns 1.00 - NA
TryEnterExit PR 14.189 ns 0.0073 ns 0.91 - NA
TryEnter_Fail Main 1.769 ns 0.0004 ns 1.00 - NA
TryEnter_Fail PR 1.770 ns 0.0003 ns 1.00 - NA
System.Threading.Tests.Perf_SemaphoreSlim
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
ReleaseWait Main 42.44 ns 0.016 ns 1.00 - NA
ReleaseWait PR 41.75 ns 0.340 ns 0.98 - NA
ReleaseWaitAsync Main 36.57 ns 0.862 ns 1.00 - NA
ReleaseWaitAsync PR 35.66 ns 0.697 ns 0.98 - NA
ReleaseWaitAsync_WithCancellationToken Main 21,264.63 ns 1,525.520 ns 1.00 583 B 1.00
ReleaseWaitAsync_WithCancellationToken PR 22,065.87 ns 640.506 ns 1.05 583 B 1.00
ReleaseWaitAsync_WithTimeout Main 21,788.43 ns 821.668 ns 1.00 679 B 1.00
ReleaseWaitAsync_WithTimeout PR 22,014.71 ns 729.106 ns 1.01 679 B 1.00
ReleaseWaitAsync_WithCancellationTokenAndTimeout Main 19,469.35 ns 1,524.439 ns 1.00 672 B 1.00
ReleaseWaitAsync_WithCancellationTokenAndTimeout PR 21,466.58 ns 853.651 ns 1.11 678 B 1.01
System.Threading.Tests.Perf_Monitor
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
EnterExit Main 15.04 ns 0.038 ns 1.00 - NA
EnterExit PR 16.67 ns 0.007 ns 1.11 - NA
TryEnterExit Main 17.25 ns 0.056 ns 1.00 - NA
TryEnterExit PR 16.78 ns 0.025 ns 0.97 - NA
System.Threading.Tests.Perf_Lock
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1  StdDev=0.004 ns
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
ReaderWriterLockSlimPerf Main 19.50 ns 0.004 ns 1.00 - NA
ReaderWriterLockSlimPerf PR 19.28 ns 0.004 ns 0.99 - NA
System.Threading.Tests.Perf_Interlocked
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
Increment_int Main 5.889 ns 0.0012 ns 1.00 - NA
Increment_int PR 5.884 ns 0.0012 ns 1.00 - NA
Decrement_int Main 5.885 ns 0.0011 ns 1.00 - NA
Decrement_int PR 5.903 ns 0.0016 ns 1.00 - NA
Increment_long Main 5.875 ns 0.0010 ns 1.00 - NA
Increment_long PR 5.876 ns 0.0022 ns 1.00 - NA
Decrement_long Main 5.875 ns 0.0011 ns 1.00 - NA
Decrement_long PR 5.874 ns 0.0015 ns 1.00 - NA
Add_int Main 5.884 ns 0.0013 ns 1.00 - NA
Add_int PR 5.885 ns 0.0017 ns 1.00 - NA
Add_long Main 5.876 ns 0.0011 ns 1.00 - NA
Add_long PR 5.875 ns 0.0009 ns 1.00 - NA
Exchange_int Main 5.829 ns 0.0013 ns 1.00 - NA
Exchange_int PR 5.868 ns 0.0014 ns 1.01 - NA
Exchange_long Main 5.836 ns 0.0019 ns 1.00 - NA
Exchange_long PR 5.835 ns 0.0010 ns 1.00 - NA
CompareExchange_int Main 5.922 ns 0.0010 ns 1.00 - NA
CompareExchange_int PR 5.926 ns 0.0013 ns 1.00 - NA
CompareExchange_long Main 5.927 ns 0.0009 ns 1.00 - NA
CompareExchange_long PR 5.921 ns 0.0008 ns 1.00 - NA
CompareExchange_object_Match Main 8.777 ns 0.0070 ns 1.00 - NA
CompareExchange_object_Match PR 8.757 ns 0.0024 ns 1.00 - NA
CompareExchange_object_NoMatch Main 7.916 ns 0.0058 ns 1.00 - NA
CompareExchange_object_NoMatch PR 8.846 ns 0.0018 ns 1.12 - NA
System.Threading.Tests.Perf_EventWaitHandle
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
Set_Reset Main 60.64 ns 0.027 ns 1.00 - NA
Set_Reset PR 61.39 ns 0.036 ns 1.01 - NA
System.Threading.Tests.Perf_CancellationToken
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NRQIIJ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-NGSIDY : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
RegisterAndUnregister_Serial Main 23.597 ns 0.0082 ns 1.00 - NA
RegisterAndUnregister_Serial PR 23.387 ns 0.0083 ns 0.99 - NA
Cancel Main 82.476 ns 0.4866 ns 1.00 192 B 1.00
Cancel PR 83.192 ns 0.4460 ns 1.01 192 B 1.00
CreateLinkedTokenSource1 Main 31.213 ns 0.6452 ns 1.00 64 B 1.00
CreateLinkedTokenSource1 PR 29.633 ns 0.1386 ns 0.95 64 B 1.00
CreateLinkedTokenSource2 Main 49.607 ns 0.0680 ns 1.00 80 B 1.00
CreateLinkedTokenSource2 PR 50.340 ns 0.2822 ns 1.01 80 B 1.00
CreateLinkedTokenSource3 Main 89.048 ns 0.2250 ns 1.00 128 B 1.00
CreateLinkedTokenSource3 PR 90.959 ns 0.3102 ns 1.02 128 B 1.00
CreateTokenDispose Main 6.723 ns 0.0733 ns 1.00 48 B 1.00
CreateTokenDispose PR 7.064 ns 0.1054 ns 1.05 48 B 1.00
CreateRegisterDispose Main 53.404 ns 0.8601 ns 1.00 192 B 1.00
CreateRegisterDispose PR 53.420 ns 0.3030 ns 1.00 192 B 1.00
CreateManyRegisterDispose Main 23.430 ns 0.0164 ns 1.00 - NA
CreateManyRegisterDispose PR 23.416 ns 0.0249 ns 1.00 - NA
CreateManyRegisterMultipleDispose Main 127.455 ns 0.2441 ns 1.00 - NA
CreateManyRegisterMultipleDispose PR 125.995 ns 0.2425 ns 0.99 - NA
CancelAfter Main 93.419 ns 0.9790 ns 1.00 144 B 1.00
CancelAfter PR 93.466 ns 1.0823 ns 1.00 144 B 1.00
System.Threading.Tasks.Tests.Perf_AsyncMethods
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
EmptyAsyncMethodInvocation Main 5.542 ns 0.0074 ns 1.00 - NA
EmptyAsyncMethodInvocation PR 5.542 ns 0.0395 ns 1.00 - NA
SingleYieldMethodInvocation Main 181.643 ns 0.8766 ns 1.00 168 B 1.00
SingleYieldMethodInvocation PR 172.875 ns 0.2976 ns 0.95 168 B 1.00
Yield Main 79.109 ns 0.0667 ns 1.00 24 B 1.00
Yield PR 76.516 ns 0.0672 ns 0.97 24 B 1.00
System.Threading.Tasks.ValueTaskPerfTest
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-UXTJFQ : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-WCARJH : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-XVCCJK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HMQJNI : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  PowerPlanMode=  IterationTime=250ms
MaxIterationCount=20  MaxWarmupIterationCount=10  MinIterationCount=15
MinWarmupIterationCount=2  WarmupCount=-1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
Await_FromResult Main 7.583 ns 0.0985 ns 1.00 - NA
Await_FromResult PR 7.528 ns 0.0245 ns 0.99 - NA
Await_FromCompletedTask Main 13.598 ns 0.1548 ns 1.00 72 B 1.00
Await_FromCompletedTask PR 14.026 ns 0.1929 ns 1.03 72 B 1.00
Await_FromCompletedValueTaskSource Main 25.403 ns 0.4806 ns 1.00 72 B 1.00
Await_FromCompletedValueTaskSource PR 26.530 ns 0.8105 ns 1.04 72 B 1.00
CreateAndAwait_FromResult Main 7.554 ns 0.0129 ns 1.00 - NA
CreateAndAwait_FromResult PR 7.566 ns 0.0097 ns 1.00 - NA
CreateAndAwait_FromResult_ConfigureAwait Main 9.240 ns 0.1781 ns 1.00 - NA
CreateAndAwait_FromResult_ConfigureAwait PR 7.495 ns 0.0418 ns 0.81 - NA
CreateAndAwait_FromCompletedTask Main 9.429 ns 0.0373 ns 1.00 - NA
CreateAndAwait_FromCompletedTask PR 9.535 ns 0.0658 ns 1.01 - NA
CreateAndAwait_FromCompletedTask_ConfigureAwait Main 9.361 ns 0.0832 ns 1.00 - NA
CreateAndAwait_FromCompletedTask_ConfigureAwait PR 9.312 ns 0.0225 ns 0.99 - NA
CreateAndAwait_FromCompletedValueTaskSource Main 10.362 ns 0.0258 ns 1.00 - NA
CreateAndAwait_FromCompletedValueTaskSource PR 10.630 ns 0.0887 ns 1.03 - NA
CreateAndAwait_FromYieldingAsyncMethod Main 290.620 ns 0.8332 ns 1.00 392 B 1.00
CreateAndAwait_FromYieldingAsyncMethod PR 293.741 ns 1.1317 ns 1.01 392 B 1.00
CreateAndAwait_FromDelayedTCS Main 21,915.089 ns 464.6644 ns 1.00 519 B 1.00
CreateAndAwait_FromDelayedTCS PR 21,953.612 ns 480.2800 ns 1.00 518 B 1.00
Copy_PassAsArgumentAndReturn_FromResult Main 4.835 ns 0.0076 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromResult PR 4.843 ns 0.0007 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromTask Main 8.296 ns 0.0055 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromTask PR 8.311 ns 0.0122 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromValueTaskSource Main 13.139 ns 0.0048 ns 1.00 - NA
Copy_PassAsArgumentAndReturn_FromValueTaskSource PR 13.156 ns 0.1064 ns 1.00 - NA
CreateAndAwait_FromCompletedValueTaskSource_ConfigureAwait Main 10.558 ns 0.1272 ns 1.00 - NA
CreateAndAwait_FromCompletedValueTaskSource_ConfigureAwait PR 10.565 ns 0.1278 ns 1.00 - NA
System.Threading.Channels.Tests.UnboundedChannelPerfTests
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
TryWriteThenTryRead Main 37.79 ns 0.007 ns 1.00 - NA
TryWriteThenTryRead PR 40.39 ns 0.028 ns 1.07 - NA
WriteAsyncThenReadAsync Main 50.86 ns 0.067 ns 1.00 - NA
WriteAsyncThenReadAsync PR 45.99 ns 0.079 ns 0.90 - NA
ReadAsyncThenWriteAsync Main 77.99 ns 0.490 ns 1.00 - NA
ReadAsyncThenWriteAsync PR 75.48 ns 0.334 ns 0.97 - NA
PingPong Main 5,514,871.03 ns 106,290.832 ns 1.00 1087 B 1.00
PingPong PR 5,255,181.47 ns 73,386.517 ns 0.95 1087 B 1.00
System.Threading.Channels.Tests.SpscUnboundedChannelPerfTests
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
TryWriteThenTryRead Main 30.80 ns 0.032 ns 1.00 - NA
TryWriteThenTryRead PR 27.20 ns 0.012 ns 0.88 - NA
WriteAsyncThenReadAsync Main 43.33 ns 0.598 ns 1.00 - NA
WriteAsyncThenReadAsync PR 38.68 ns 0.122 ns 0.89 - NA
ReadAsyncThenWriteAsync Main 74.22 ns 0.290 ns 1.00 - NA
ReadAsyncThenWriteAsync PR 73.65 ns 1.433 ns 0.99 - NA
PingPong Main 5,692,114.84 ns 79,885.611 ns 1.00 1087 B 1.00
PingPong PR 5,464,477.26 ns 116,552.029 ns 0.96 1087 B 1.00
System.Threading.Channels.Tests.BoundedChannelPerfTests
BenchmarkDotNet v0.16.0-nightly.20260518.1249, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Neoverse-N2, 8 physical cores
Memory: 31.27 GB Total, 1.96 GB Available
  Job-TPEJOW : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
  Job-HKHXHK : .NET 11.0.0 (11.0.0-dev, 42.42.42.42424), Arm64 RyuJIT armv8.0-a
EvaluateOverhead=False  OutlierMode=Default  PowerPlanMode=
IterationTime=250ms  MaxIterationCount=20  MemoryRandomization=Default
MinIterationCount=15  WarmupCount=1
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
TryWriteThenTryRead Main 52.18 ns 0.061 ns 1.00 - NA
TryWriteThenTryRead PR 48.51 ns 0.693 ns 0.93 - NA
WriteAsyncThenReadAsync Main 60.95 ns 0.328 ns 1.00 - NA
WriteAsyncThenReadAsync PR 60.13 ns 0.562 ns 0.99 - NA
ReadAsyncThenWriteAsync Main 73.42 ns 0.340 ns 1.00 - NA
ReadAsyncThenWriteAsync PR 70.95 ns 0.137 ns 0.97 - NA
PingPong Main 5,425,824.17 ns 102,591.823 ns 1.00 1091 B 1.00
PingPong PR 5,359,848.63 ns 105,279.337 ns 0.99 1087 B 1.00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated
Copilot AI review requested due to automatic review settings June 24, 2026 19:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Outdated
Comment on lines +281 to +285
// This is a case when we have:
// * a fat lock - the most likely case by far, or
// * we don't own the lock and need to throw and it is ok if the lock gets inflated.
// Let the slow path handle this.
Monitor.GetLockObject(obj).Exit();

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. Typical program would not see these exceptions except if it has bugs.

Comment on lines +303 to 316
// if unused for anything, try setting our thread id
// N.B. hashcode, thread ID and sync index are never 0, and hashcode is largest of all
if (oldBits == 0)
{
int* pHeader = GetHeaderPtr(ppMethodTable);
int oldBits = *pHeader;
// if unused for anything, try setting our thread id
// N.B. hashcode, thread ID and sync index are never 0, and hashcode is largest of all
if ((oldBits & MASK_HASHCODE_INDEX) == 0)
// Thread IDs are allocated sequentially starting from 1 and recycled, so it's
// unusual to have a thread ID that doesn't fit in the thin-lock field.
// Check here rather than at entry to keep the hot path as tight as possible.
// The uninitialized 0 id is also ruled out by this check.
// If the id doesn't fit, we fall through and call TryAcquireUncommon outside the
// fixed block to avoid keeping the object pinned while potentially spinning.
if ((uint)(currentThreadID - 1) < (uint)SBLK_MASK_LOCK_THREADID)
{
if (Interlocked.CompareExchange(pHeader, oldBits | currentThreadID, oldBits) == oldBits)
if (Interlocked.CompareExchange(pHeader, currentThreadID, oldBits) == oldBits)
{

@VSadov VSadov Jun 24, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC_RESERVE is set in GC pause, we will never see it set when acquiring the lock.
It is possible to see FINALIZER_RUN, but chances are nearly 0

These are uncommon cases.

Comment thread src/coreclr/vm/syncblk.inl
Comment thread src/coreclr/vm/util.hpp Outdated
Copilot AI review requested due to automatic review settings June 24, 2026 23:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs Outdated
@VSadov

VSadov commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

For perf measuring I use the following benchmark:
(a refreshed subset of scenarios in dotnet/coreclr#13670 (comment))

The benchmark measures lock throughput in ~0.5sec time samples in variety of scenarios. The higher the score, the better.

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

internal class Program
{
    private static readonly int ProcessorCount = Environment.ProcessorCount;

    private static void Main(string[] args)
    {
        System.Console.WriteLine("MonitorEnterExitThroughput_ThinLock");
        MonitorEnterExitThroughput(1, false, false);
        System.Console.WriteLine("MonitorEnterExitThroughput_FatLock");
        MonitorEnterExitThroughput(1, false, true);

        System.Console.WriteLine("MonitorReliableEnterExitThroughput_ThinLock");
        MonitorReliableEnterExitThroughput(1, false, false);
        System.Console.WriteLine("MonitorReliableEnterExitThroughput_FatLock");
        MonitorReliableEnterExitThroughput(1, false, true);

        System.Console.WriteLine("MonitorTryEnterExitWhenUnlockedThroughput_ThinLock");
        MonitorTryEnterExitWhenUnlockedThroughput_ThinLock(1);
        System.Console.WriteLine("MonitorTryEnterExitWhenUnlockedThroughput_FatLock");
        MonitorTryEnterExitWhenUnlockedThroughput_FatLock(1);

        System.Console.WriteLine("MonitorTryEnterWhenLockedThroughput_ThinLock");
        MonitorTryEnterWhenLockedThroughput_ThinLock(1);
        System.Console.WriteLine("MonitorTryEnterWhenLockedThroughput_FatLock");
        MonitorTryEnterWhenLockedThroughput_FatLock(1);

        System.Console.WriteLine("MonitorEnterExitThroughput_ThinLock 4 threads");
        MonitorEnterExitThroughput(4, false, false);
    }

    private static void MonitorReliableEnterExitThroughput(int threadCount, bool delay, bool convertToFatLock)
    {
        var threadReady = new AutoResetEvent(false);
        var startTest = new ManualResetEvent(false);
        var threadOperationCounts = new int[(threadCount + 1) * 16];
        var m = new object();

        if (convertToFatLock)
            Monitor.Enter(m);

        ParameterizedThreadStart threadStart = data =>
        {
            int threadIndex = (int)data;
            var localDelay = delay;
            var localThreadOperationCounts = threadOperationCounts;
            var localM = m;
            var rng = localDelay ? new Random(threadIndex) : null;
            threadReady.Set();
            if (convertToFatLock)
            {
                Monitor.Enter(localM);
                Monitor.Exit(localM);
            }
            startTest.WaitOne();
            if (localDelay)
            {
                while (true)
                {
                    var d0 = RandomShortDelay(rng);
                    var d1 = RandomShortDelay(rng);
                    lock (localM)
                        Delay(d0);
                    ++localThreadOperationCounts[threadIndex];
                    Delay(d1);
                }
            }
            else
            {
                while (true)
                {
                    lock (localM)
                    {
                    }
                    ++localThreadOperationCounts[threadIndex];
                }
            }
        };
        var threads = new Thread[threadCount];
        for (int i = 0; i < threads.Length; ++i)
        {
            var t = new Thread(threadStart);
            t.IsBackground = true;
            t.Start((i + 1) * 16);
            threadReady.WaitOne();
            threads[i] = t;
        }

        if (convertToFatLock)
        {
            Thread.Sleep(50);
            Monitor.Exit(m);
        }

        Run(startTest, threadOperationCounts);
    }

    private static void MonitorEnterExitThroughput(int threadCount, bool delay, bool convertToFatLock)
    {
        var threadReady = new AutoResetEvent(false);
        var startTest = new ManualResetEvent(false);
        var threadOperationCounts = new int[(threadCount + 1) * 16];
        var m = new object();

        if (convertToFatLock)
            Monitor.Enter(m);

        ParameterizedThreadStart threadStart = data =>
        {
            int threadIndex = (int)data;
            var localDelay = delay;
            var localThreadOperationCounts = threadOperationCounts;
            var localM = m;
            var rng = localDelay ? new Random(threadIndex) : null;
            threadReady.Set();
            if (convertToFatLock)
            {
                Monitor.Enter(localM);
                Monitor.Exit(localM);
            }
            startTest.WaitOne();
            if (localDelay)
            {
                while (true)
                {
                    var d0 = RandomShortDelay(rng);
                    var d1 = RandomShortDelay(rng);
                    Monitor.Enter(localM);
                    Delay(d0);
                    Monitor.Exit(localM);
                    ++localThreadOperationCounts[threadIndex];
                    Delay(d1);
                }
            }
            else
            {
                while (true)
                {
                    Monitor.Enter(localM);
                    Monitor.Exit(localM);
                    ++localThreadOperationCounts[threadIndex];
                }
            }
        };
        var threads = new Thread[threadCount];
        for (int i = 0; i < threads.Length; ++i)
        {
            var t = new Thread(threadStart);
            t.IsBackground = true;
            t.Start((i + 1) * 16);
            threadReady.WaitOne();
            threads[i] = t;
        }

        if (convertToFatLock)
        {
            Thread.Sleep(50);
            Monitor.Exit(m);
        }

        Run(startTest, threadOperationCounts);
    }

    private static void MonitorTryEnterExitThroughput(int threadCount, bool delay, bool convertToFatLock)
    {
        var threadReady = new AutoResetEvent(false);
        var startTest = new ManualResetEvent(false);
        var threadOperationCounts = new int[(threadCount + 1) * 16];
        var m = new object();

        if (convertToFatLock)
            Monitor.Enter(m);

        ParameterizedThreadStart threadStart = data =>
        {
            int threadIndex = (int)data;
            var localDelay = delay;
            var localThreadOperationCounts = threadOperationCounts;
            var localM = m;
            var rng = localDelay ? new Random(threadIndex) : null;
            threadReady.Set();
            if (convertToFatLock)
            {
                Monitor.Enter(localM);
                Monitor.Exit(localM);
            }
            startTest.WaitOne();
            if (localDelay)
            {
                while (true)
                {
                    var d0 = RandomShortDelay(rng);
                    var d1 = RandomShortDelay(rng);
                    if (!Monitor.TryEnter(localM, -1))
                        return;
                    Delay(d0);
                    Monitor.Exit(localM);
                    ++localThreadOperationCounts[threadIndex];
                    Delay(d1);
                }
            }
            else
            {
                while (true)
                {
                    if (!Monitor.TryEnter(localM, -1))
                        return;
                    Monitor.Exit(localM);
                    ++localThreadOperationCounts[threadIndex];
                }
            }
        };
        var threads = new Thread[threadCount];
        for (int i = 0; i < threads.Length; ++i)
        {
            var t = new Thread(threadStart);
            t.IsBackground = true;
            t.Start((i + 1) * 16);
            threadReady.WaitOne();
            threads[i] = t;
        }

        if (convertToFatLock)
        {
            Thread.Sleep(50);
            Monitor.Exit(m);
        }

        Run(startTest, threadOperationCounts);
    }

    private static void MonitorTryEnterExitWhenUnlockedThroughput_ThinLock(int threadCount)
    {
        threadCount = 1;
        var threadReady = new AutoResetEvent(false);
        var startTest = new ManualResetEvent(false);
        var threadOperationCounts = new int[(threadCount + 1) * 16];
        var m = new object();

        ParameterizedThreadStart threadStart = data =>
        {
            int threadIndex = (int)data;
            var localThreadOperationCounts = threadOperationCounts;
            var localM = m;
            threadReady.Set();
            startTest.WaitOne();
            while (true)
            {
                if (!Monitor.TryEnter(localM))
                    return;
                Monitor.Exit(localM);
                ++localThreadOperationCounts[threadIndex];
            }
        };
        var threads = new Thread[threadCount];
        for (int i = 0; i < threads.Length; ++i)
        {
            var t = new Thread(threadStart);
            t.IsBackground = true;
            t.Start((i + 1) * 16);
            threadReady.WaitOne();
            threads[i] = t;
        }

        Run(startTest, threadOperationCounts);
    }

    private static void MonitorTryEnterExitWhenUnlockedThroughput_FatLock(int threadCount)
    {
        threadCount = 1;
        var threadReady = new AutoResetEvent(false);
        var startTest = new ManualResetEvent(false);
        var threadOperationCounts = new int[(threadCount + 1) * 16];
        var m = new object();

        Monitor.Enter(m);

        ParameterizedThreadStart threadStart = data =>
        {
            int threadIndex = (int)data;
            var localThreadOperationCounts = threadOperationCounts;
            var localM = m;
            threadReady.Set();
            Monitor.Enter(localM);
            Monitor.Exit(localM);
            startTest.WaitOne();
            while (true)
            {
                if (!Monitor.TryEnter(localM))
                    return;
                Monitor.Exit(localM);
                ++localThreadOperationCounts[threadIndex];
            }
        };
        var threads = new Thread[threadCount];
        for (int i = 0; i < threads.Length; ++i)
        {
            var t = new Thread(threadStart);
            t.IsBackground = true;
            t.Start((i + 1) * 16);
            threadReady.WaitOne();
            threads[i] = t;
        }

        Thread.Sleep(50);
        Monitor.Exit(m);

        Run(startTest, threadOperationCounts);
    }

    private static void MonitorTryEnterWhenLockedThroughput_ThinLock(int threadCount)
    {
        threadCount = 1;
        var threadReady = new AutoResetEvent(false);
        var startTest = new ManualResetEvent(false);
        var threadOperationCounts = new int[(threadCount + 1) * 16];
        var m = new object();

        Monitor.Enter(m);

        ParameterizedThreadStart threadStart = data =>
        {
            int threadIndex = (int)data;
            var localThreadOperationCounts = threadOperationCounts;
            var localM = m;
            threadReady.Set();
            startTest.WaitOne();
            while (true)
            {
                if (Monitor.TryEnter(localM))
                    return;
                ++localThreadOperationCounts[threadIndex];
            }
        };
        var threads = new Thread[threadCount];
        for (int i = 0; i < threads.Length; ++i)
        {
            var t = new Thread(threadStart);
            t.IsBackground = true;
            t.Start((i + 1) * 16);
            threadReady.WaitOne();
            threads[i] = t;
        }

        Run(startTest, threadOperationCounts);
        Monitor.Exit(m);
    }

    private static void MonitorTryEnterWhenLockedThroughput_FatLock(int threadCount)
    {
        threadCount = 1;
        var threadReady = new AutoResetEvent(false);
        var startTest = new ManualResetEvent(false);
        var threadOperationCounts = new int[(threadCount + 1) * 16];
        var m = new object();

        Monitor.Enter(m);

        ParameterizedThreadStart threadStart = data =>
        {
            int threadIndex = (int)data;
            var localThreadOperationCounts = threadOperationCounts;
            var localM = m;
            threadReady.Set();
            if (Monitor.TryEnter(localM, 50))
                return;
            startTest.WaitOne();
            while (true)
            {
                if (Monitor.TryEnter(localM))
                    return;
                ++localThreadOperationCounts[threadIndex];
            }
        };
        var threads = new Thread[threadCount];
        for (int i = 0; i < threads.Length; ++i)
        {
            var t = new Thread(threadStart);
            t.IsBackground = true;
            t.Start((i + 1) * 16);
            threadReady.WaitOne();
            threads[i] = t;
        }

        Thread.Sleep(50);

        Run(startTest, threadOperationCounts);
        Monitor.Exit(m);
    }

    private static void Run(
        ManualResetEvent startTest,
        int[] threadOperationCounts,
        bool hasOneResult = false,
        int iterations = 4)
    {
        var sw = new Stopwatch();
        int threadCount = threadOperationCounts.Length / 16 - 1;
        var afterWarmupOperationCounts = new long[threadCount];
        var operationCounts = new long[threadCount];
        startTest.Set();

        // Warmup

        Thread.Sleep(100);

        //while (true)
        for (int j = 0; j < iterations; ++j)
        {
            for (int i = 0; i < threadCount; ++i)
                afterWarmupOperationCounts[i] = threadOperationCounts[(i + 1) * 16];

            // Measure

            sw.Restart();
            Thread.Sleep(500);
            sw.Stop();

            for (int i = 0; i < threadCount; ++i)
                operationCounts[i] = threadOperationCounts[(i + 1) * 16];
            for (int i = 0; i < threadCount; ++i)
                operationCounts[i] -= afterWarmupOperationCounts[i];

            double score = operationCounts.Sum() / sw.Elapsed.TotalMilliseconds;
            Console.WriteLine("Score: {0:0.000000}", score);
        }
    }

    

    internal static class Clock
    {
        private static readonly long s_swFrequency = Stopwatch.Frequency;
        private static readonly double s_swFrequencyDouble = s_swFrequency;

        public static long Ticks => Stopwatch.GetTimestamp();
        public static double TicksToS(long ticks) => ticks / s_swFrequencyDouble;
        public static double TicksToMs(long ticks) => ticks * 1000 / s_swFrequencyDouble;
        public static double TicksToUs(long ticks) => ticks * (1000 * 1000) / s_swFrequencyDouble;
    }

    private static uint RandomShortDelay(Random rng) => (uint)rng.Next(4, 10);
    private static uint RandomMediumDelay(Random rng) => (uint)rng.Next(10, 15);
    private static uint RandomLongDelay(Random rng) => (uint)rng.Next(15, 20);

    private static int[] s_delayValues = new int[32];

    private static void Delay(uint n)
    {
        Interlocked.MemoryBarrier();
        s_delayValues[16] += (int)Fib(n);
    }

    private static uint Fib(uint n)
    {
        if (n <= 1)
            return n;
        return Fib(n - 2) + Fib(n - 1);
    }
}s

@VSadov

VSadov commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

Benchmark results on x64 (AMD EPYC 7763, 32core VM)

Higher throughput score is better.

=== Baseline:

MonitorEnterExitThroughput_ThinLock
Score: 112428.900747
Score: 111783.752488
Score: 113419.808611
Score: 113442.662281
MonitorEnterExitThroughput_FatLock
Score: 46157.123683
Score: 48000.371799
Score: 48099.023496
Score: 47988.831344
MonitorReliableEnterExitThroughput_ThinLock
Score: 106102.947661
Score: 106286.004254
Score: 106221.572736
Score: 106272.136345
MonitorReliableEnterExitThroughput_FatLock
Score: 47304.596231
Score: 47163.620709
Score: 47399.212754
Score: 47329.591684
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 113513.919261
Score: 113225.298418
Score: 113564.640819
Score: 112520.718262
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 42566.084846
Score: 42940.802887
Score: 43032.479556
Score: 42996.606459
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 173.389527
Score: 173.548475
Score: 172.357821
Score: 171.551225
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 66307.880609
Score: 67656.424536
Score: 67674.791044
Score: 67664.123052
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 8149.181411
Score: 8233.202662
Score: 8035.122987
Score: 7812.506877

=== The PR:

MonitorEnterExitThroughput_ThinLock
Score: 123717.645995
Score: 150695.493623
Score: 150582.117361
Score: 150607.656947
MonitorEnterExitThroughput_FatLock
Score: 55948.260182
Score: 56902.551369
Score: 56929.967978
Score: 57493.021462
MonitorReliableEnterExitThroughput_ThinLock
Score: 152183.331853
Score: 152233.433818
Score: 152205.512011
Score: 152311.782295
MonitorReliableEnterExitThroughput_FatLock
Score: 51260.060598
Score: 51196.551482
Score: 51253.166479
Score: 51320.225118
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 154645.971799
Score: 154676.104510
Score: 154674.927959
Score: 154534.755270
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 56142.602136
Score: 56896.959708
Score: 56892.219811
Score: 56912.731038
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 118727.308710
Score: 127358.623441
Score: 127314.213791
Score: 127367.907338
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 78735.923471
Score: 79527.985176
Score: 79399.555781
Score: 79407.502195
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 8996.907248
Score: 9067.823620
Score: 9005.611525
Score: 8881.371878

@VSadov

VSadov commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

Benchmark results on ARM64 (Ampere Altra, 32core VM)

Higher throughput score is better.

=== Baseline:

MonitorEnterExitThroughput_ThinLock
Score: 56660.840612
Score: 56774.778486
Score: 56776.290530
Score: 56773.270009
MonitorEnterExitThroughput_FatLock
Score: 33092.046995
Score: 33860.236575
Score: 33629.040063
Score: 33671.006944
MonitorReliableEnterExitThroughput_ThinLock
Score: 50486.523951
Score: 50489.640400
Score: 50498.821481
Score: 50495.320848
MonitorReliableEnterExitThroughput_FatLock
Score: 33005.903001
Score: 33006.283203
Score: 33024.192097
Score: 33009.303281
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 56415.386677
Score: 56388.496396
Score: 56422.446890
Score: 56382.934901
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 32905.674919
Score: 33031.659251
Score: 33084.943202
Score: 33067.449134
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 259.025333
Score: 259.219450
Score: 259.235537
Score: 259.250594
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 71438.197628
Score: 73018.990510
Score: 73017.067410
Score: 73017.266668
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 12254.268761
Score: 15645.607837
Score: 14481.160326
Score: 12710.443639

The PR:

MonitorEnterExitThroughput_ThinLock
Score: 56841.928352
Score: 59747.613341
Score: 59712.092971
Score: 59694.204615
MonitorEnterExitThroughput_FatLock
Score: 45634.475103
Score: 46592.575766
Score: 46603.756078
Score: 46604.988869
MonitorReliableEnterExitThroughput_ThinLock
Score: 52273.566581
Score: 52248.714359
Score: 52248.507635
Score: 52266.090756
MonitorReliableEnterExitThroughput_FatLock
Score: 43159.098549
Score: 43155.886952
Score: 43154.965290
Score: 43154.502357
MonitorTryEnterExitWhenUnlockedThroughput_ThinLock
Score: 60253.312182
Score: 60239.423887
Score: 60010.210232
Score: 59319.153080
MonitorTryEnterExitWhenUnlockedThroughput_FatLock
Score: 44742.336339
Score: 45499.768636
Score: 45494.957562
Score: 45501.833201
MonitorTryEnterWhenLockedThroughput_ThinLock
Score: 114713.191316
Score: 135761.317621
Score: 135779.174657
Score: 135770.136089
MonitorTryEnterWhenLockedThroughput_FatLock
Score: 89207.767292
Score: 90605.549413
Score: 90590.193857
Score: 90732.494476
MonitorEnterExitThroughput_ThinLock 4 threads
Score: 14256.491761
Score: 16478.849131
Score: 17066.182584
Score: 16827.141237

@VSadov VSadov marked this pull request as ready for review June 25, 2026 06:12
@VSadov VSadov requested a review from MichalStrehovsky as a code owner June 25, 2026 06:12
Copilot AI review requested due to automatic review settings June 25, 2026 06:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Comment on lines +119 to +126
int currentThreadID = ManagedThreadId.Current;
if ((uint)currentThreadID <= (uint)SBLK_MASK_LOCK_THREADID)
{
if (Interlocked.CompareExchange(pHeader, currentThreadID, oldBits) == oldBits)
{
return HeaderLockResult.Success;
}
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On CoreCLR the runtime sets this field to nonzero value before a thread can observe it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants