Improve support for very large inputs. by romerojosh · Pull Request #105 · NVIDIA/cuDecomp

romerojosh · 2026-02-13T20:07:59Z

Recent flagship GPU models have significantly increased HBM capacities (e.g, GB200 with 186 GB of HBM, GB300 with 279 GB of HBM), enabling users to run much larger per GPU problem sizes with cuDecomp than in the past.

cuDecomp still relies on standard MPI APIs, which have count/offset arguments limited to the maximum valueint32_t. We use appropriate MPI datatypes in the MPI communication routines so that this maximum limit applies to the number of elements rather than bytes. This results in the following problem size limitations, in terms of local pencil size:

For problems routed to MPI_Alltoall with an int32_t count: (2^31-1) * <nranks> elements per local pencil
For problems routed to other routines with MPI_Alltoallv-like patterns with int32_t counts and offsets: (2^31-1) elements per local pencil

This leads to a maximum local pencil size of 8 GiB for our smallest supported type float, up to 32 GiB for complex<double>. With workspace requirements (2x the local pencil size) and assuming most workloads do other things with GPU memory, these limitations were generally not an issue for most users using GPUs with 40 - 80 GiB capacities. With that said, the code currently will not inform users they have violated these limitations and will just silently fail, which is not ideal.

This PR remedies this situation by:

Replacing int32_t with int64_t for internal count/offset handling, only downcasting to int32_t when needed for MPI APIs. This enables NCCL- and NVSHMEM-based backends to correctly run on very large inputs without these MPI specific limits.
Adding checks for when the counts or offset arguments are larger than int32_t before downcasting, and if so, throwing a not supported error and informing the user the particular transpose/halo backend is not usable with their problem size.

When "big count" support is more widely available in MPI, we can adopt those APIs (e.g. MPI_Alltoall_c) but as of now, even the current OpenMPI 5.x release does not have these functions available.

This PR does not address communication backend autotuning cases with these large input sizes. These cases can potentially error out when testing the MPI-based backends, even if the NCCL and NVSHMEM backends are viable candidates. This will be addressed in a follow up PR.

romerojosh · 2026-02-13T20:24:54Z

/build

github-actions · 2026-02-13T20:25:22Z

🚀 Build workflow triggered! View run

github-actions · 2026-02-13T20:31:18Z

✅ Build workflow passed! View run

Signed-off-by: Josh Romero <joshr@nvidia.com>

…cks. Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh added 3 commits February 17, 2026 11:36

Improve support for very large inputs.

60d1270

Signed-off-by: Josh Romero <joshr@nvidia.com>

Formatting fixes.

6413eb7

Signed-off-by: Josh Romero <joshr@nvidia.com>

Add more explicit static_casts. Use correct comm backend for halo che…

c1feca0

…cks. Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh force-pushed the large_count_guardrails branch from 3272e1b to c1feca0 Compare February 17, 2026 19:42

romerojosh merged commit a8f5668 into main Feb 17, 2026
4 checks passed

romerojosh deleted the large_count_guardrails branch February 24, 2026 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve support for very large inputs.#105

Improve support for very large inputs.#105
romerojosh merged 3 commits intomainfrom
large_count_guardrails

romerojosh commented Feb 13, 2026

Uh oh!

romerojosh commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

romerojosh commented Feb 13, 2026

Uh oh!

romerojosh commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant