[xpupti] Emit GPU_USER_ANNOTATION events from user correlation map#1393
[xpupti] Emit GPU_USER_ANNOTATION events from user correlation map#1393SlawomirLaba wants to merge 1 commit into
Conversation
The XPU PTI plugin already wires user correlation IDs into PTI via ptiViewPushExternalCorrelationId / PTI_VIEW_EXTERNAL_KIND_CUSTOM_1 and populates userCorrelationMap_ in handleCorrelationActivity(), but it never produced any GPU_USER_ANNOTATION events from that information, so user record_function() ranges did not appear on the device timeline. After a GPU activity (kernel, memcpy, memset) is logged, look up its correlation id in userCorrelationMap_, resolve the originating CPU activity via the linked-activity callback, and synthesize a GPU_USER_ANNOTATION GenericTraceActivity that spans the GPU activity's time range on the same device/stream. A small per-session set keyed by (device, stream, user_correlation_id) deduplicates annotations so each record_function range is emitted at most once per stream. Only emit the synthesized event when the caller requested ActivityType::GPU_USER_ANNOTATION, to avoid producing extra events for clients that do not opt in. Tests ----- Extend RunProfilerTest with two optional parameters (userCorrelationId, linkedCpuActivity) so existing test scaffolding can drive a session that pushes/pops a user correlation id around the XPU workload and resolves it back to a CPU-side activity through the linked-activity callback. Existing tests are unaffected (defaults are 0 / nullptr). Add XpuptiProfilerTest.GpuUserAnnotation, which enables GPU_USER_ANNOTATION (alongside the existing GPU/runtime activities), runs the XPU compute helper inside a user correlation range, and asserts that the expected synthesized "user_function" annotation events are emitted on each participating GPU stream.
|
Hi @SlawomirLaba! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
cc: @gujinghui, @chuanqi129 |
The XPU PTI plugin already wires user correlation IDs into PTI via ptiViewPushExternalCorrelationId / PTI_VIEW_EXTERNAL_KIND_CUSTOM_1 and populates userCorrelationMap_ in handleCorrelationActivity(), but it never produced any GPU_USER_ANNOTATION events from that information, so user record_function() ranges did not appear on the device timeline.
After a GPU activity (kernel, memcpy, memset) is logged, look up its correlation id in userCorrelationMap_, resolve the originating CPU activity via the linked-activity callback, and synthesize a GPU_USER_ANNOTATION GenericTraceActivity that spans the GPU activity's time range on the same device/stream. A small per-session set keyed by (device, stream, user_correlation_id) deduplicates annotations so each record_function range is emitted at most once per stream.
Only emit the synthesized event when the caller requested ActivityType::GPU_USER_ANNOTATION, to avoid producing extra events for clients that do not opt in.
Tests
Extend RunProfilerTest with two optional parameters (userCorrelationId, linkedCpuActivity) so existing test scaffolding can drive a session that pushes/pops a user correlation id around the XPU workload and resolves it back to a CPU-side activity through the linked-activity callback. Existing tests are unaffected (defaults are 0 / nullptr).
Add XpuptiProfilerTest.GpuUserAnnotation, which enables GPU_USER_ANNOTATION (alongside the existing GPU/runtime activities), runs the XPU compute helper inside a user correlation range, and asserts that the expected synthesized "user_function" annotation events are emitted on each participating GPU stream.