Skip to content

Add XCCL collective communication activity tracing to XPU plugin#1396

Open
tsocha wants to merge 1 commit into
pytorch:mainfrom
intel-staging:dev/tsocha/oneccl-3
Open

Add XCCL collective communication activity tracing to XPU plugin#1396
tsocha wants to merge 1 commit into
pytorch:mainfrom
intel-staging:dev/tsocha/oneccl-3

Conversation

@tsocha
Copy link
Copy Markdown

@tsocha tsocha commented May 11, 2026

It's a part of #1335 3/3

Enable PTI_VIEW_COMMUNICATION collection in the XPU PTI plugin so oneCCL host-side collective operations show up in Kineto traces. Events are emitted as COLLECTIVE_COMM activities named with an "xccl::" prefix and carry the PTI communicator id as metadata.

  • Gate new code paths on PTI_VERSION_AT_LEAST(0, 17)
  • Wire enable/disable of PTI_VIEW_COMMUNICATION in XpuptiActivityApi
  • Add handleCommunicationActivity for pti_view_record_comms records
  • Add unit tests covering naming, field mapping, and out-of-range drop
  • Document INTEL_LIBITTNOTIFY64 requirement in libkineto/README.md

Enable PTI_VIEW_COMMUNICATION collection in the XPU PTI plugin so oneCCL
host-side collective operations show up in Kineto traces. Events are
emitted as COLLECTIVE_COMM activities named with an "xccl::" prefix and
carry the PTI communicator id as metadata.

- Gate new code paths on PTI_VERSION_AT_LEAST(0, 17)
- Wire enable/disable of PTI_VIEW_COMMUNICATION in XpuptiActivityApi
- Add handleCommunicationActivity for pti_view_record_comms records
- Add unit tests covering naming, field mapping, and out-of-range drop
- Document INTEL_LIBITTNOTIFY64 requirement in libkineto/README.md
@meta-cla meta-cla Bot added the cla signed label May 11, 2026
@tsocha
Copy link
Copy Markdown
Author

tsocha commented May 11, 2026

@gujinghui please review it.

#if PTI_VERSION_AT_LEAST(0, 17)
case ActivityType::COLLECTIVE_COMM: {
auto rc = ptiViewEnable(PTI_VIEW_COMMUNICATION);
if (rc != PTI_SUCCESS) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we do not follow the existing code style to use XPUPTI_CALL macro?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants