[EventPipe] Add Non-Lossy EventPipe Mode for gcdump#129457
Conversation
Introduce EventPipeBufferingMode (Drop/Block) and carry it as a per-session opt-in on EventPipeSessionOptions (default Drop), plumbed through enable() and ep_session_alloc onto the buffer manager. Block is wired up here but not yet acted on; producers start parking on full buffers in a later change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends native EventPipe to support an opt-in non-lossy (“blocking”) buffering mode intended for GC heap snapshot (gcdump) scenarios, and reworks provider-enable callback dispatch to avoid blocking producers before a drain thread exists.
Changes:
- Introduces
EventPipeBufferingMode(DROP/BLOCK) andEventPipeWriteEventResult(WRITTEN/DROPPED/BLOCKED), plumbing the buffering mode through session options to the buffer manager. - Adds a block-and-retry path in the write pipeline: writers may park on buffer exhaustion in BLOCK mode, and teardown can abort/wake parked writers.
- Defers provider enable-callback invocation to
ep_start_streaming, storing callbacks on the session until streaming begins.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/eventpipe/ep.h | Adds buffering_mode to EventPipeSessionOptions. |
| src/native/eventpipe/ep.c | Implements blocking retry loop, abort-on-disable, and deferred provider callback dispatch. |
| src/native/eventpipe/ep-types-forward.h | Adds new enums for buffering mode and write results. |
| src/native/eventpipe/ep-thread.h | Adds EP_SESSION_USE_WRITE_BUFFER_IN_USE bit for writer/reader coordination. |
| src/native/eventpipe/ep-thread.c | Adjusts assertions to account for the new session-use bit. |
| src/native/eventpipe/ep-session.h | Stores provider callback queue on session; changes ep_session_write_event return type. |
| src/native/eventpipe/ep-session.c | Plumbs buffering mode into buffer manager; updates inflight-wait logic; updates write path to return EventPipeWriteEventResult. |
| src/native/eventpipe/ep-buffer-manager.h | Adds BLOCK-mode fields/APIs; changes write API to return EventPipeWriteEventResult. |
| src/native/eventpipe/ep-buffer-manager.c | Implements BLOCK-mode signaling/aborting and returns BLOCKED on buffer exhaustion when appropriate. |
| src/mono/mono/eventpipe/test/ep-tests.c | Updates tests for new ep_session_write_event return type (currently has incorrect enum constant usages). |
| src/mono/mono/eventpipe/test/ep-buffer-manager-tests.c | Updates tests for new ep_buffer_manager_write_event return type (currently has incorrect enum constant usage). |
In Block mode a producer that finds the buffers full parks until the reader frees capacity and then retries, instead of dropping the event. The reader signals a per-buffer-manager auto-reset event on each capacity release; a parked producer clears the WRITE_BUFFER_IN_USE bit of its session_use_in_progress (keeping the session index) so the reader can drain its full buffer while the retained index pins the buffer manager. Teardown raises an abort flag and reuses the existing in-flight-write wait to release parked producers, so no separate parked-writer count is needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
enable()/ep_enable_3 only sets the session up and collects its provider-enable callbacks onto the session; ep_start_streaming is now the single site that invokes them, for every session. This is required for a blocking GCHeapSnapshot session: its provider-enable callback triggers a GC heap walk that parks producers when the buffer fills, so it must run only after the drain thread is live to consume - otherwise the walk parks with no reader and deadlocks. ep_start_streaming waits for the drain thread to reach its preemptive drain loop before invoking the callbacks; when streaming is parked for ep_finish_init at early startup it invokes inline, which is safe because the heap walk no-ops until the GC is fully initialized. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ep_start_streaming invoked the provider-enable callbacks that enable() collected onto the session *after* leaving the EventPipe lock, reading, clearing, and freeing session->provider_callbacks with no synchronization. A concurrent disable - most reachably the session's own streaming thread self-disabling on a write failure - runs ep_session_dec_ref under the lock and frees that same queue, so the two paths could race into a use-after-free / double-free. Detach the queue under the lock instead (grab the pointer and NULL the field), then invoke and free that private copy outside the lock. Ownership is now unambiguous: a concurrent disable's ep_session_dec_ref sees NULL and leaves the queue alone, and its defensive drain/free still covers the case where a session is disabled before ep_start_streaming runs. The single-use dispatch helper is inlined now that it just drains a local queue. The has_started spin still reads the session outside the lock (it must - waiting under the lock could deadlock the drain thread startup against a GC holding the thread-store lock), but that read is safe: disable joins the drain thread before freeing the session, and the thread publishes "started" before it can exit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c7a287d to
f15cb28
Compare
- Reclaim a buffer's memory before releasing its budget (now a single helper), so the Block-mode capacity signal is only observed once the memory is freed - matching the buffer-allocation error path. - ep_buffer_manager_abort_blocked_writers only raises the abort flag. Waking parked producers is owned by ep_session_wait_for_inflight_thread_ops, which signals capacity each iteration until every producer observes the abort, so the previous lone signal was redundant and racy and is removed. - ep_session_wait_for_inflight_thread_ops selects the Block-mode wait from the buffer manager's immutable buffering mode and asserts the abort flag is set, instead of branching on a racy field. - Fold write_event_2's first send and its Block-mode park/retry into one loop with a single ep_session_write_event call. - When a session is disabled before ep_start_streaming dispatched its provider-enable callbacks, move them into the disable callback queue so they still fire, balanced with the disable notifications, and so each provider's callbacks_pending is decremented - otherwise ep_delete_provider can wait on it forever. - Assert the Block-only buffer_available_event is valid where it is waited on and signaled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds test_buffer_manager_block_mode_abort_and_disable to the native EventPipe buffer-manager unit tests, covering the non-lossy Block buffering path: - fill a 1 MB buffer pool until ep_buffer_manager_write_event returns EP_WRITE_EVENT_RESULT_BLOCKED, verifying a full buffer parks the producer (park-and-retry) instead of dropping the event; - abort the blocked writers and verify a subsequent write gives up (no longer BLOCKED) rather than parking; - verify the capacity signal/wait wake path returns without hanging; - tear the Block-mode session down via ep_session_dec_ref. Supporting changes in the same test file: - Refactor buffer_manager_init into buffer_manager_init_mode (parameterized by buffering mode); buffer_manager_init forwards EP_BUFFERING_MODE_DROP so existing callers are unchanged. This also corrects a pre-existing arg misalignment in the shared ep_session_alloc call (it was missing circular_buffer_size_in_mb and user_events_data_fd). - write_events now publishes the session index via ep_thread_set_session_use_in_progress before writing, satisfying the producer-index assert that Block mode added to ep_buffer_manager_write_event. NOTE: this native EventPipe unit-test target is not compiled in CI, so the test is not compile- or run-verified here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| @@ -83,6 +83,16 @@ buffer_manager_release_buffer ( | |||
| EventPipeBufferManager *buffer_manager, | |||
| uint32_t size); | |||
|
|
|||
| // Free a buffer's memory and then release its reserved budget. The order matters: the Block-mode | |||
There was a problem hiding this comment.
Nit: Comments about the implementation details are usually more relevant at the implementation site rather than on the method declaration.
| { | ||
| EP_ASSERT (buffer_manager != NULL); | ||
| EP_ASSERT (ep_rt_wait_event_is_valid (&buffer_manager->buffer_available_event)); | ||
| ep_rt_wait_event_set (&buffer_manager->buffer_available_event); |
There was a problem hiding this comment.
I don't think a single auto reset event will offer any fairness guarantees and under high contention we might need them to ensure threads don't get starved. I'd suggest adding an explicit queue. Each thread waiting for a buffer can enqueue itself and the per-thread queue node holds a wait event rather than a shared global one so that only the thread at the front of the queue is woken when space becomes available.
There was a problem hiding this comment.
Agree, there is no fairness guarantees on events and it's an implementation detail how threads get signaled (both on win32 and posixs). We could have a lazy allocated event on the ep_thread object that can be queued and awaited. The queue could be a dn_queue owned by buffer manager and guarded by buffer manager lock.
| ep_thread_session_state_increment_sequence_number (session_state); | ||
|
|
||
| // In block mode, notify the caller that they should park and retry, unless the session is closing, | ||
| // or rundown is enabled, in which case we cannot block without deadlocking. |
There was a problem hiding this comment.
This sounds fine for now (and for .NET 11), but there does appear to be a pre-existing limitation that any rundown events larger than the buffer size get dropped. This exists for non-blocking mode too so its nothing unique to this feature, just an odd limitation we've had for years. I'm surprised we haven't heard complaints about it.
| @@ -981,7 +1057,8 @@ ep_buffer_manager_write_event ( | |||
| EP_ASSERT (thread == ep_rt_thread_get_handle ()); | |||
|
|
|||
| // Before we pick a buffer, make sure the event is enabled. | |||
| ep_return_false_if_nok (ep_event_is_enabled (ep_event)); | |||
| if (!ep_event_is_enabled (ep_event)) | |||
| return EP_WRITE_EVENT_RESULT_DROPPED; | |||
There was a problem hiding this comment.
For clarity I'd rename RESULT_DROPPED to RESULT_NOT_WRITTEN.
Everywhere else we use the term 'drop' it means we incremented the sequence number and one of the debug events_dropped variables so it has a more specific meaning than 'not written'.
| // Before: enable() built these on a stack-local queue and invoked them inline, so each provider's | ||
| // EventSource enable notification fired during enable(), unconditionally, before streaming began. | ||
| // | ||
| // Why deferred: a blocking GCHeapSnapshot enable callback forces a stop-the-world heap walk that parks |
There was a problem hiding this comment.
The description makes this sound specific to GC details but I would state this in a more generalized way. Enable callbacks are allowed to generate an unbounded number of events which would block this thread if nothing is concurrently draining them.
| @@ -12,6 +12,14 @@ | |||
| #endif | |||
| #include "ep-getter-setter.h" | |||
|
|
|||
| // OR'd into EventPipeThread.session_use_in_progress alongside the session index to mean "actively | |||
| // writing into the buffer". The reader steals a buffer only while the bit is set; the index alone | |||
There was a problem hiding this comment.
| // writing into the buffer". The reader steals a buffer only while the bit is set; the index alone | |
| // writing into the buffer". The reader steals a buffer only while the bit is clear; the index alone |
| // Why deferred: a blocking GCHeapSnapshot enable callback forces a stop-the-world heap walk that parks | ||
| // the cooperative producer on a full buffer, and only the session's drain thread frees that capacity; | ||
| // invoking it before that thread is live self-deadlocks. So enable() parks the callbacks here and a | ||
| // later site dispatches them once the drain thread runs. |
There was a problem hiding this comment.
The callbacks are one thing that could generate events prior to the streaming thread running but they aren't the only one. In theory once we've enabled the allow_writes flag for a session and enabled the event configuration state any thread could write events to the session and become deadlocked. Practically I don't think any callers do that today, or don't do it enough to fill the buffer but I'd feel better if we didn't open the door to that possibility. A suggestion on how to avoid it:
- Rename the session 'enable()' functions to something like 'session_init'. These APIs would do all the allocation they do today + registering the session in the global session list but they would not do any of the steps that enable events to be received, send events, or invoke callbacks such as:
- ep_volatile_store_allow_write
- ep_config_enable_disable
- ep_sample_profiler_init
-
Rename
ep_start_streaming()toep_session_enable(). This function currently has some logic that maybe it starts the streaming thread immediately or maybe it defers. We can have a helperenable_helper()that either gets invoked immediately in the non-defered case, or it gets invoked by finish_init in the defered case. All of the callback invocation logic + event enabling logic that got moved out of the old enable() function would come to this new helper. The streaming thread start functionality would also be here. This ensures that creating the streaming thread happens before any events are received regardless of deferral. -
Since all the functions that initialize the callback queue and invoke the callbacks are in the new enable_helper() function there should be no need for a heap allocated callback queue, it can stay on the stack as before which should simplify the cleanup.
-
Currently the config code that re-computes the global keyword/level information is a little eager to assume that any registered session (anything in _ep_sessions) should be included. Prior to calling enable_helper() we wouldn't want the session to show up in those global calculations so we might want to check the allow_write flag or create some dedicated flag that indicates a session is enabled and should be included in those calculations.
-
disable()/enable_helper() might need some adjustments to ensure enable_helper() doesn't try to do any work on a session that was already disabled and that disable() doesn't assume enable_helper() has already run. Both enable_helper() and disable() should be able to take the global EP lock which makes synchronization straight forward.
Of course this is just a rough idea in my head and I might be missing things. If you forsee issues or have alternate ideas on how to resolve it I'm glad to chat any time!
|
@pavelsavara, FYI. Will this PR have impact on single threaded WASM that needs to be considered? |
So I think this new feature should not compile under Alternatively it could write-thru (there is no other thread that would conflict). The underlying web-socket or JS buffer never blocks. But that would probably need more testing. |
|
/azp run runtime-wasm |
|
Azure Pipelines successfully started running 1 pipeline(s). |
- Rename EP_WRITE_EVENT_RESULT_DROPPED to EP_WRITE_EVENT_RESULT_NOT_WRITTEN: the result covers every not-written case (buffers full, oversized, suspended, or not enabled), not only dropping. - Move the buffer_manager_free_buffer_and_release_budget ordering comment onto the definition. - Clarify the EventPipeSession::provider_callbacks deferral rationale and fix the write-buffer-in-use bit-state comment in ep-thread.h. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Block mode previously woke a parked producer through a single shared auto-reset event, which gave no ordering (a producer could starve) and coalesced multiple capacity releases into one wake. Replace it with a strict-FIFO queue: each producer that cannot reserve buffer budget enqueues itself and parks on its own event, only the producer at the head may reserve, and the reader wakes just that one when it frees a buffer (handing the line off to the next while budget remains). - The queue is a dn_queue of EventPipeThread* on the buffer manager. Each EventPipeThread carries an enqueued flag and a lazily-allocated auto-reset event (freed with the thread, so threads that never park pay no OS handle). - buffer_manager_try_reserve_buffer_fair enforces the order: reserve only when at the head (already front, or the queue is empty so nobody is barged), otherwise enqueue at the tail and let the caller park; rundown/teardown writers never park. - The queue is guarded by the buffer manager's existing rt_lock rather than a new lock: the reader frees budget and wakes the front while already holding rt_lock, and CoreCLR forbids nesting spin locks, so a dedicated lock could not be taken there. Setting an event under rt_lock is allowed. - Teardown raises the abort flag then wakes and clears the whole queue, so ep_session_wait_for_inflight_thread_ops just waits the producers' indices out. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move EventPipe session enablement out of the under-lock enable() path into
ep_start_streaming so the provider-enable callbacks live on a stack-local queue
for one call instead of a heap queue parked on the session.
ep_enable_3 now calls session_init, which only allocates and publishes the
session inert (providers unconfigured, allow_write unset, uncounted). The
caller's subsequent ep_start_streaming runs enable_holding_lock under the lock
(config_enable_disable, allow_write, number_of_sessions, sample profiler),
starts the drain thread, waits for it, then dispatches the collected callbacks.
This removes EventPipeSession::provider_callbacks and the detach-under-lock
dance in ep_start_streaming and disable_holding_lock (the orphan drain), since
generation and dispatch now happen in one place. The enable-before-disable
callback ordering reverts to the long-standing upstream behavior (a non-IPC
disable racing start_streaming's outside-the-lock dispatch can reorder), which
is not a regression.
Tidy the teardown side to mirror creation: session_fini (the counterpart to
session_init - unpublish, drain-if-enabled, close, reclaim) and session_free
(the counterpart to ep_session_alloc) are extracted, and disable_helper is
renamed stop_session to name the lock-free disable driver.
Because session_init publishes an inert session before enable_holding_lock
runs, three things keep that window safe:
- config_compute_keyword_and_level / config_register_provider skip sessions
whose allow_write bit is clear, so an inert session is invisible to provider
config.
- disable_holding_lock tears down enablement state only if the session was
actually enabled (allow_write bit set); otherwise it just unpublishes and
frees. Its caller gate also accepts an inert-only session (present in the
collection even when number_of_sessions is 0) so a shutdown racing an
in-progress enable cleanly unpublishes it instead of orphaning it.
- ep_session_dec_ref closes the user_events data fd, covering an inert (or
alloc-error) USEREVENTS teardown that never runs ep_session_disable.
Behavior-preserving.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Create the SESSION drain thread as a raw native thread (like the diagnostics server thread) instead of a managed CLR Thread, so it needs no GC / Thread Store and can start during early startup. With a native drain thread there is no longer any reason to defer session streaming until ep_finish_init: - ep_rt_thread_create routes EP_THREAD_TYPE_SESSION through ::CreateThread, carrying the session pointer and reusing ep_rt_thread_coreclr_start_func (which skips DestroyThread when there is no managed Thread). SAMPLING stays a managed thread - its callback needs a fully-initialized GC. - streaming_thread tolerates a NULL managed-Thread handle and no longer wraps its drain loop in a GC-mode transition (a native thread has no GC mode). - ep_start_streaming starts the drain thread immediately instead of parking the session id; _ep_deferred_enable_session_ids and the ep_finish_init enable- replay are removed. _ep_can_start_threads is kept for the sample profiler and the rundown-on-disable path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…E_THREADS) builds Block (non-lossy) mode parks a producer on a full buffer until the drain thread frees capacity. Under PERFTRACING_DISABLE_THREADS (single-threaded, e.g. browser WASM) there is no separate drain thread - the drain runs cooperatively on the same thread via the JS job queue - and ep_rt_wait_event_wait is a NOOP. A parked producer would therefore busy-spin forever while the drain it is waiting on can never run. Block cannot work single-threaded by design. Gate the Block park/abort/fair-reserve machinery under #ifndef PERFTRACING_DISABLE_THREADS so it is not compiled into single-threaded builds: the wait_queue field, buffer_manager_try_reserve_buffer_fair / buffer_manager_signal_front_waiter / ep_buffer_manager_writer_wait_for_capacity / ep_buffer_manager_is_aborting / ep_buffer_manager_abort_blocked_writers, the buffering_mode==BLOCK write-path branches, and the producer park-retry loop. A buffering_mode of BLOCK falls through to the existing DROP path there (a single write that drops on a full buffer). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Block (non-lossy) buffering parks a producer on a full buffer until the drain thread frees capacity, so it only works for session types that have a continuous native drain - the ones with a streaming thread: IPCSTREAM and FILESTREAM. FILE and LISTENER sessions use the buffer manager but have no streaming thread (a FILE session flushes only at disable; a LISTENER session is pumped by an in-proc managed poll via EventPipeEventDispatcher), so a parked producer would stall app threads until teardown. SYNCHRONOUS and USEREVENTS have no buffer manager at all. Degrade Block to Drop in ep_session_alloc for any session type without a streaming thread, so a request can never deadlock a session that has no drainer. Callers that genuinely need Block (the IPC CollectTracing6 opt-in, and the env-var FILESTREAM session) only request it for streaming sessions and are unaffected; this is a defensive central guard for all current and future session-creation paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CollectTracing6 (0x0207) extends CollectTracing5 with a trailing uint32 buffering mode after the provider configs, for streaming (IPCSTREAM) sessions only. It encodes EventPipeBufferingMode: 0 = Drop (default, lossy), non-zero = Block (non-lossy). user_events sessions don't use the buffer manager, so they omit the field. The shared command payload now defaults buffering_mode to Drop explicitly (via the enum in ds_eventpipe_collect_tracing_command_payload_alloc), so every older CollectTracing version is unchanged. ep_session_options_init takes a buffering_mode parameter instead of hard-coding Drop, and the collect handler passes payload->buffering_mode straight through it - no post-init override - which flows the mode to the buffer manager via ep_enable_3. This is the client-facing opt-in for the non-lossy Block mode added earlier in this series: a client (e.g. dotnet-gcdump on a large heap) requests lossless collection by sending CollectTracing6 with Block. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The startup EventPipe session (DOTNET_EnableEventPipe) had no way to request non-lossy Block buffering. Add DOTNET_EventPipeBufferingMode (0 = Drop, the default; non-zero = Block), read by a new ep_rt_config_value_get_buffering_mode on CoreCLR and NativeAOT plus a new CLRConfig entry. Mono stubs the getter to Drop (the function exists only to satisfy the ep-rt adapter contract); Block on Mono can be wired up later. ep_enable_2 gains a buffering_mode parameter (it already carries the other per-session knobs - format, rundown_keyword, etc.), so the startup session can request a mode while every other path passes EP_BUFFERING_MODE_DROP. Managed (EventListener), profiler, and gen-analysis sessions go through ep_enable and are unaffected. Block only takes effect for a streaming session (DOTNET_EventPipeOutputStreaming=1, i.e. FILESTREAM); on a non-streaming FILE session ep_session_alloc degrades it to Drop, as noted on the config value. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Either of these sound like good solutions to me. 👍 |
Runtime side to fix dotnet/diagnostics#2404.
Problem
EventPipe's buffer manager is lossy by design: when a producer's per-thread buffer is full
and the circular pool is exhausted, the event is dropped (accounted via a sequence number)
rather than blocking the writer.
This corrupts
dotnet-gcdump. A heap snapshot is rebuilt from theGCBulkNode/GCBulkEdgestream emitted during a stop-the-world heap walk; on a large heap that burst overruns the buffer,
edges are dropped, and the graph cannot be reconstructed - gcdump fails with
System.ApplicationException: RootIndex not setand writes no dump (dotnet/diagnostics#2404).This PR adds an opt-in, per-session
EventPipeBufferingModethat defaults toDROP(today'sbehavior) and can be set to
BLOCKto make a streaming session non-lossy. Two opt-ins areincluded - the IPC
CollectTracingcommand (used bydotnet-gcdump --non-lossy) and aDOTNET_EventPipeBufferingModeenvironment variable for the startup session.Parking producers instead of dropping
The non-lossy guarantee comes down to one change in the write path: when a producer in Block mode
has filled its buffer and the pool cannot hand it another, instead of dropping the event it
parks until the reader frees capacity and then retries the same write. Parked producers wait
in a strict-FIFO queue; only the thread at the front may reserve the next freed buffer, each
thread waits on its own event, and the reader wakes the front waiter every time it returns a
drained buffer to the pool. A parked producer makes forward progress only once the event actually
lands.
Parking has to cooperate with the existing buffer hand-off, which is where a new pin flag comes
in. Each producer advertises, in its per-thread
session_use_in_progressslot, the index of thesession it is writing to; teardown spins on that slot to keep the session and its buffer manager
alive until every in-flight writer is done. We widen that slot with a high
EP_SESSION_USE_WRITE_BUFFER_IN_USEbit meaning "I currently own a write buffer," and the readerdrains a producer's buffer only once that bit is clear. A parked producer therefore clears the
bit but keeps the index: clearing the bit lets the reader drain the very buffer the producer
just filled - the only way capacity is ever freed - while keeping the index leaves the session
pinned so it cannot be torn down from under the sleeping thread. On wake the producer re-sets the
bit, reloads the session pointer (disable may have cleared it), and retries.
Blocking is only ever allowed when it is actually safe to wait for a reader: the session is in
Block mode, it is not tearing down, and the caller is not the rundown thread. Teardown is
the first escape hatch -
disableraises anabortingflag and wakes every parked producer, andthe producer, which re-checks
abortingimmediately before and after each wait, gives up anddrops rather than waiting on a reader that is going away. Rundown is the second - rundown events
are emitted from the drain side during disable, so a rundown writer that blocked would be waiting
on itself; it is excluded from blocking and falls back to the normal drop path.
Expressing these three outcomes cleanly is why the write path's old success/fail boolean became an
EventPipeWriteEventResultenum - WRITTEN (the event landed), DROPPED (discarded:oversized, provider disabled, suspended, or a genuine Drop-mode overflow, all already tracked by
sequence numbers), and BLOCKED (Block mode, buffer full, session still live - park and retry).
Only BLOCKED drives the park loop.
Starting the drain thread before the heap walk
Enabling a session invokes each provider's enable callback. For a gcdump (
GCHeapSnapshot)session that callback synchronously forces a full, stop-the-world heap walk that emits the entire
GCBulkNode/GCBulkEdgestream - which in Block mode is exactly what fills the buffer and parksthe producer, and the producer can only make progress if a drain thread already exists. Originally
the callbacks fired inline inside
ep_enable_3, before streaming starts, so the walk ran duringSuspendEEwith no drainer: the buffer filled, the heap-walk thread parked forever, and thetarget deadlocked.
Two changes fix the ordering:
session_init/ enable split.ep_enable_3now only publishes the session inert(
session_init) and callback dispatch moves to a single site inep_start_streaming, whichcollects the provider-enable callbacks under the lock, starts the drain thread, waits for it
to be running (
EP_YIELD_WHILE (!ep_session_has_started)), and only then dispatches them.A consumer is therefore guaranteed to be in place before the walk can fill the buffer, and the
drain loop runs in preemptive GC mode, so it keeps draining even while the heap walk holds
the EE suspended.
Native, eager drain threads. The session drain thread is now a raw native thread (like the
diagnostics server thread) instead of a managed CLR thread, so it needs no GC / Thread Store and
starts eagerly at enable time. This removes the previous startup deferral (sessions enabled
early no longer wait until
ep_finish_initto start streaming), so even a startup gcdump-stylesession has a live drainer before its heap walk runs.
Where Block is allowed
Block parks a producer until a drain frees capacity, so it is only safe for session types that
have a continuous native drain thread:
FILESTREAMandIPCSTREAM.FILEandLISTENERsessions use the buffer manager but have no streaming thread (a
FILEsession flushes only atdisable; a
LISTENERsession is pumped by an in-proc managed poll), andSYNCHRONOUS/USEREVENTShave no buffer manager. A central guard inep_session_alloctherefore degradesBlock to Drop for any session type without a streaming thread, so a request can never deadlock a
session that has no drainer - regardless of how the mode was requested.
The two opt-ins:
IPC -
CollectTracing6(0x0207). ExtendsCollectTracing5with a trailinguint32buffering mode for streaming (
IPCSTREAM) sessions:0= Drop, non-zero = Block. Older commandversions are unchanged.
dotnet-gcdump --non-lossysends it (dotnet/diagnostics).Env var -
DOTNET_EventPipeBufferingMode. Lets theDOTNET_EnableEventPipestartup sessionopt in:
0= Drop (default), non-zero = Block. It only takes effect for a streaming session(
DOTNET_EventPipeOutputStreaming=1, i.e.FILESTREAM); on a non-streamingFILEsession theguard above degrades it to Drop. Wired on CoreCLR and NativeAOT (Mono stubs the getter to Drop).
Managed (
EventListener), profiler, and gen-analysis sessions go through paths that always passDrop and are unaffected.
Single-threaded builds
Under
PERFTRACING_DISABLE_THREADS(single-threaded, e.g. browser WASM) there is no separatedrain thread - the drain runs cooperatively on the same thread - and
ep_rt_wait_event_waitis ano-op, so a parked producer could never be woken. The Block park/abort/fair-reserve machinery is
#ifndef-d out of those builds, and aBLOCKrequest degrades to the existing Drop path.Testing
Validated on a Checked CoreCLR build (asserts on; none fired). CoreCLR Checked builds clean
(0 warnings, 0 errors).
End-to-end gcdump (the actual use case) -
dotnet-gcdump --non-lossy(which sendsCollectTracing6with Block) against a target holding a 2,000,000-node object graph: the commandis accepted, the heap walk completes, and the report reconstructs exactly 2,000,000 nodes with
exit 0 and no hang/deadlock.
Block vs Drop differential - a startup
FILESTREAMsession with a 1 MB circular buffer(
DOTNET_EnableEventPipe=1,DOTNET_EventPipeOutputStreaming=1,DOTNET_EventPipeBufferingMode), target emits 1,000,000 events as fast as possible:=0)=1)Guard - a non-streaming
FILEsession (DOTNET_EventPipeOutputStreaming=0) withDOTNET_EventPipeBufferingMode=1completes normally with no hang: the guard degrades it toDrop rather than parking a producer that has no drainer.