[CUDA] implement Hadamard transform by Lyxot · Pull Request #3179 · ml-explore/mlx

Lyxot · 2026-02-27T02:48:42Z

Proposed changes

This PR adds CUDA support for Hadamard (mx.hadamard_transform) with the same staged decomposition strategy used by the Metal backend.

Changed files

mlx/backend/cuda/hadamard.cu: implemented CUDA Hadamard::eval_gpu and JIT launch flow (n1/n2/m staged execution), reusing decompose_hadamard(...).
mlx/backend/cuda/device/hadamard.cuh: added JIT device kernels hadamard_n<...> and hadamard_m<...> plus radix helpers.
python/tests/cuda_skip.py: removed CUDA skip entries

Validation

python -m pytest python/tests/test_ops.py -k test_hadamard -q passed.
python -m pytest python/tests/test_ops.py -k test_hadamard_grad_vmap -q passed.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

Copilot

Pull request overview

This PR implements CUDA support for the Hadamard transform (mx.hadamard_transform), following the same staged decomposition strategy as the Metal backend. The implementation decomposes the Hadamard transform into three stages (n1, n2, and m) to efficiently handle large transforms while respecting GPU memory constraints.

Changes:

Added CUDA kernel implementation for Hadamard transform with JIT compilation support
Enabled CUDA tests by removing skip entries for test_hadamard and test_hadamard_grad_vmap
Integrated the implementation into the CUDA backend build system

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`mlx/backend/cuda/hadamard.cu`	Implements the main CUDA evaluation logic with staged kernel launches and JIT code generation for non-power-of-two radices
`mlx/backend/cuda/device/hadamard.cuh`	Provides device-side kernel templates for n-stage and m-stage transforms with vectorized memory access
`mlx/backend/cuda/primitives.cpp`	Removes `NO_GPU(Hadamard)` to enable GPU evaluation path
`mlx/backend/cuda/jit_module.cpp`	Registers the hadamard.cuh header for JIT compilation
`mlx/backend/cuda/CMakeLists.txt`	Adds hadamard.cu to build sources
`python/tests/cuda_skip.py`	Removes CUDA skip entries to enable Hadamard tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlx/backend/cuda/device/hadamard.cuh

mlx/backend/cuda/hadamard.cu

nastya236 · 2026-02-27T20:18:29Z

Looks great, thanks for the contribution.

Could you please share bandwidth numbers for the proposed kernel across a range of shapes? I’m also particularly interested in case where the Hadamard transform is applied to tiled inputs with N=16 or N=32. Something like:

x = mx.random.uniform(shape=(4096, 4096))
mx.hadamard_transform(x.reshape(4096, 4096 // N, N))

where N=16 or N=32.

Copilot AI review requested due to automatic review settings February 27, 2026 02:48

Copilot started reviewing on behalf of Lyxot February 27, 2026 02:49 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

mlx/backend/cuda/device/hadamard.cuh Outdated Show resolved Hide resolved

zcbenz reviewed Feb 27, 2026

View reviewed changes

mlx/backend/cuda/hadamard.cu Outdated Show resolved Hide resolved

Lyxot added 3 commits February 27, 2026 17:41

implement jit hadamard

22239af

enable Hadamard tests

68a7ea6

apply pr comments

3d7e6f1

Lyxot force-pushed the cuda/hadamard branch from 82e5aa1 to 3d7e6f1 Compare February 27, 2026 09:49

Lyxot requested a review from zcbenz February 27, 2026 09:51

nastya236 self-requested a review February 27, 2026 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] implement Hadamard transform#3179

[CUDA] implement Hadamard transform#3179
Lyxot wants to merge 3 commits intoml-explore:mainfrom
Lyxot:cuda/hadamard

Lyxot commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

nastya236 commented Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Lyxot commented Feb 27, 2026

Proposed changes

Changed files

Validation

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

nastya236 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nastya236 commented Feb 27, 2026 •

edited

Loading