Skip to content

Force workspace usage with MPI backends on MNNVL communicators when fabric allocated workspace is available.#108

Merged
romerojosh merged 3 commits intomainfrom
mpi_mnnvl_no_shortcut
Mar 4, 2026
Merged

Force workspace usage with MPI backends on MNNVL communicators when fabric allocated workspace is available.#108
romerojosh merged 3 commits intomainfrom
mpi_mnnvl_no_shortcut

Conversation

@romerojosh
Copy link
Collaborator

@romerojosh romerojosh commented Mar 4, 2026

This PR detects and disables some of the transpose shortcut paths (e.g. alltoall directly from/to user input/output buffers bypassing the workspace) for MPI backends in situations where:

  1. The workspace is fabric allocated (e.g. CUDECOMP_ENABLE_CUMEM=1)
  2. The communicator is on an MNNVL-equipped system and contains multi-node NVLink connections.

The reason for this is that at the current time, MPI communication over MNNVL involving non-fabric allocated buffers uses a staging protocol to route communication via internal fabric-allocated buffers which is a bit less efficient than communication using already fabric allocated buffers, similar to the treatment of managed memory allocations. The shortcut paths present MPI with non-fabric allocated user input/output buffers which trigger this less efficient path, negating the benefits of the shortcut.

Similarly logic is applied to halo communication cases that would normally bypass work space staging.

…ors when fabric allocated workspace is available.

Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
@romerojosh
Copy link
Collaborator Author

/build

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

🚀 Build workflow triggered! View run

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

✅ Build workflow passed! View run

Signed-off-by: Josh Romero <joshr@nvidia.com>
@romerojosh
Copy link
Collaborator Author

/build

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

🚀 Build workflow triggered! View run

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

✅ Build workflow passed! View run

@romerojosh romerojosh changed the title Disable transpose shortcut paths for MPI backends on MNNVL communicators when fabric allocated workspace is available. Force workspace usage with MPI backends on MNNVL communicators when fabric allocated workspace is available. Mar 4, 2026
@romerojosh romerojosh merged commit 3c68e4a into main Mar 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant