Skip to content

Add workaround for cuTENSOR 2.3.x - 2.5.x OOB host write bug impacting large problem sizes#113

Merged
romerojosh merged 3 commits intomainfrom
cutensor_large_input_fix
Mar 12, 2026
Merged

Add workaround for cuTENSOR 2.3.x - 2.5.x OOB host write bug impacting large problem sizes#113
romerojosh merged 3 commits intomainfrom
cutensor_large_input_fix

Conversation

@romerojosh
Copy link
Collaborator

Fixes #112.

This PR adds a check that detects the affected cuTENSOR versions at runtime via cutensorGetVersion() and pre-splits the input/output tensors into chunks below the INT32_MAX/2 threshold before calling cuTENSOR, avoiding the problematic code path described in #112. No behavior change occurs for unaffected cuTENSOR versions or for tensors below the split threshold.

…g large problem sizes.

Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
@romerojosh
Copy link
Collaborator Author

/build

@github-actions
Copy link

🚀 Build workflow triggered! View run

@github-actions
Copy link

✅ Build workflow passed! View run

@romerojosh romerojosh merged commit 3e0a669 into main Mar 12, 2026
4 checks passed
romerojosh added a commit that referenced this pull request Mar 12, 2026
…g large problem sizes (#113)

* Add workaround for cuTENSOR 2.3.x - 2.5.x OOB host write bug impacting large problem sizes.

Signed-off-by: Josh Romero <joshr@nvidia.com>

* Formatting.

Signed-off-by: Josh Romero <joshr@nvidia.com>

* Bump cuDecomp version to prep for hotfix release.

Signed-off-by: Josh Romero <joshr@nvidia.com>

---------

Signed-off-by: Josh Romero <joshr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cuTENSOR large-tensor permutation bug causes heap corruption in cudecompTranspose for tensors with >~1B elements

1 participant