Use native NVSHMEM synchronization APIs in NVSHMEM backends#107
Merged
romerojosh merged 11 commits intomainfrom Mar 4, 2026
Merged
Use native NVSHMEM synchronization APIs in NVSHMEM backends#107romerojosh merged 11 commits intomainfrom
romerojosh merged 11 commits intomainfrom
Conversation
…nchornization routines. Use better scoped signal-based synchronization in pipelined backend. Signed-off-by: Josh Romero <joshr@nvidia.com>
…rs to avoid quiet call. Signed-off-by: Josh Romero <joshr@nvidia.com>
…e signal wait directly in consumer stream in NVSHMEM pipelined backend. Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
…vshmem_sync_event. Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
…needed conditional barrier logic in NVSHMEM pipelined backend. Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Collaborator
Author
|
/build |
|
🚀 Build workflow triggered! View run |
|
❌ Build workflow failed! View run |
Signed-off-by: Josh Romero <joshr@nvidia.com>
Collaborator
Author
|
/build |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
…tween ops. Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Collaborator
Author
|
/build |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
romerojosh
added a commit
that referenced
this pull request
Mar 8, 2026
romerojosh
added a commit
that referenced
this pull request
Mar 11, 2026
* Revert "Revert "Use native NVSHMEM synchronization APIs in NVSHMEM backends (#107)" (#110)" This reverts commit e401242. Signed-off-by: Josh Romero <joshr@nvidia.com> * Apply NVSHMEM_CUMEM_GRANULARITY workaround for older NVSHMEM versions. Signed-off-by: Josh Romero <joshr@nvidia.com> * Increase NVSHMEM_CUMEM_GRANULARITY to maximum of 2 GiB. Signed-off-by: Josh Romero <joshr@nvidia.com> --------- Signed-off-by: Josh Romero <joshr@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR replaces the current use of CPU-based MPI synchronization primitives in the NVSHMEM backends with NVSHMEM native APIs. This has been found to improve performance in many cases, as well as make the NVSHMEM backends CPU-synchronization free.
In particular, the non-pipelined NVSHMEM backend replaces the existing CPU synchronization pattern (quiet -> stream synchronize -> MPI_Barrier) with a call to
nvshmemx_barrier_on_stream. For the pipelined NVSHMEM backend, the existing barrier synchronization is re-implemented using signal-based APIs, enabling more targeted synchronization between GPU pairs involved in each stage.