Skip to content

Conversation

@vpietila-amd
Copy link
Contributor

Proposed changes

Added factories and dispatching for bwd weight conv algorithms from old CK that are currently exposed by the narrow build of CK. As part of the factory implementation, I did the following refactoring

  • Removed direction specific covn specializations since they were mostly identical. Now, there's only one specialization enum ConvSpecialization and different algorithms support a different set of specializations. This change simplfies the code as we get rid of various std::variants.
  • Separate the algorithm descriptions (concepts) from the dispatching logic and moved the concepts to a separate file. Added some base concepts that are shared between different algorithm specializations.
  • Combined all algorithm specializations (reference, large tensor, two-stage multiple-D) under an enum ConvAlgorithmSpecialization. This allows us to handle all specializations in the same way.

Ville Pietilä added 30 commits December 18, 2025 04:36
shumway
shumway previously approved these changes Jan 7, 2026
Copy link
Collaborator

@shumway shumway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge PR, but overall looks good.

My only concern is the static_assert in unit tests, which should really be a runtime error (failing tests will break the build and no tests will run).

I'm doing major refactoring on conv_traits.hpp that conflict, but the changes here look minimal, I'm guessing just to keep the build from breaking.

Copy link
Contributor

@spolifroni-amd spolifroni-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readme needs clarification.

since `gfx9` architectures do not support WMMA. Hence, to compile also the WMMA builders, add e.g.
`gfx1121` to the list of supported architectures or add flag `-D CK_USE_WMMA=ON`. One still needs
a Navi card to execute the Builder tests that use the GPU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really confusing.

If I got this right:

  1. WMMA isn't supported on gfx9* architectures (which seems strange since the prebuilt rocWMMA is supported on gfx908, gfx90a, gfx942, and gfx950)
  2. Because of this, the WMMA builder won't automatically be included in the tests
  3. So to compile the WMMA builders you need to add a non-gfx9* architecture to the list?
  4. And you need a navi card?

Let me know if I got this right, and if I do, I'll rewrite this for you real quick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified the comment, the original one was a bit offhanded. The bottom line is this

Tests for WMMA builder are compiled on when CMake variable CK_USE_WMMA evaluates to true. This can be achieved in of the following ways

  1. Use one of the gfx11 or gfx12 architectures as the GPU target, in which case CMake turns this flag on.
  2. Turn on this flag explicitly by given a compiler option -D CK_USE_WMMA=ON.

Most of the builder tests do not run the instances. Instead, they check that the instance traits match with the expected traits. There are few end-to-end tests that actually execute instance on the GPU. I didn't yet add such tests for the WMMA builders, but when they are added, they will require an actual Navi card.

I hope this clarifies the Readme comment.

@vpietila-amd
Copy link
Contributor Author

@shumway, regarding your comment of using static_assert in the unit tests. It was a case of an obsolete unit test file that was testing functionality that I didn't include in this PR. I had initially forgotten to remove the file, but it should be removed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants