Skip to content

[xpupti] Emit GPU_USER_ANNOTATION events from user correlation map#1393

Draft
SlawomirLaba wants to merge 1 commit into
pytorch:mainfrom
SlawomirLaba:dev/slabax/gpu_user_annotation
Draft

[xpupti] Emit GPU_USER_ANNOTATION events from user correlation map#1393
SlawomirLaba wants to merge 1 commit into
pytorch:mainfrom
SlawomirLaba:dev/slabax/gpu_user_annotation

Conversation

@SlawomirLaba
Copy link
Copy Markdown

The XPU PTI plugin already wires user correlation IDs into PTI via ptiViewPushExternalCorrelationId / PTI_VIEW_EXTERNAL_KIND_CUSTOM_1 and populates userCorrelationMap_ in handleCorrelationActivity(), but it never produced any GPU_USER_ANNOTATION events from that information, so user record_function() ranges did not appear on the device timeline.

After a GPU activity (kernel, memcpy, memset) is logged, look up its correlation id in userCorrelationMap_, resolve the originating CPU activity via the linked-activity callback, and synthesize a GPU_USER_ANNOTATION GenericTraceActivity that spans the GPU activity's time range on the same device/stream. A small per-session set keyed by (device, stream, user_correlation_id) deduplicates annotations so each record_function range is emitted at most once per stream.

Only emit the synthesized event when the caller requested ActivityType::GPU_USER_ANNOTATION, to avoid producing extra events for clients that do not opt in.

Tests

Extend RunProfilerTest with two optional parameters (userCorrelationId, linkedCpuActivity) so existing test scaffolding can drive a session that pushes/pops a user correlation id around the XPU workload and resolves it back to a CPU-side activity through the linked-activity callback. Existing tests are unaffected (defaults are 0 / nullptr).

Add XpuptiProfilerTest.GpuUserAnnotation, which enables GPU_USER_ANNOTATION (alongside the existing GPU/runtime activities), runs the XPU compute helper inside a user correlation range, and asserts that the expected synthesized "user_function" annotation events are emitted on each participating GPU stream.

The XPU PTI plugin already wires user correlation IDs into PTI via
ptiViewPushExternalCorrelationId / PTI_VIEW_EXTERNAL_KIND_CUSTOM_1 and
populates userCorrelationMap_ in handleCorrelationActivity(), but it
never produced any GPU_USER_ANNOTATION events from that information,
so user record_function() ranges did not appear on the device timeline.

After a GPU activity (kernel, memcpy, memset) is logged, look up its
correlation id in userCorrelationMap_, resolve the originating CPU
activity via the linked-activity callback, and synthesize a
GPU_USER_ANNOTATION GenericTraceActivity that spans the GPU activity's
time range on the same device/stream. A small per-session set keyed by
(device, stream, user_correlation_id) deduplicates annotations so each
record_function range is emitted at most once per stream.

Only emit the synthesized event when the caller requested
ActivityType::GPU_USER_ANNOTATION, to avoid producing extra events for
clients that do not opt in.

Tests
-----
Extend RunProfilerTest with two optional parameters
(userCorrelationId, linkedCpuActivity) so existing test scaffolding
can drive a session that pushes/pops a user correlation id around the
XPU workload and resolves it back to a CPU-side activity through the
linked-activity callback. Existing tests are unaffected (defaults are
0 / nullptr).

Add XpuptiProfilerTest.GpuUserAnnotation, which enables
GPU_USER_ANNOTATION (alongside the existing GPU/runtime activities),
runs the XPU compute helper inside a user correlation range, and
asserts that the expected synthesized "user_function" annotation
events are emitted on each participating GPU stream.
@meta-cla
Copy link
Copy Markdown

meta-cla Bot commented May 8, 2026

Hi @SlawomirLaba!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@scotts scotts assigned scotts and unassigned scotts May 11, 2026
@scotts
Copy link
Copy Markdown
Contributor

scotts commented May 11, 2026

cc: @gujinghui, @chuanqi129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants