Skip to content

Conversation

@YangFan0918
Copy link

🚀 Summary

This PR removes a redundant CP_ASYNC_WAIT_GROUP(0); instruction in kernels/sgemm/sgemm_async.cu and applies the required code formatting to pass the CI checks.

📝 Details

1. Dead Code Removal
In the sgemm_t_8x8_sliced_k16_f32x4_bcf_dbuf_kernel, the memory loading is currently implemented using synchronous instructions (explicit register buffering using LDG / STS or direct assignment), rather than Ampere's asynchronous copy (cp.async).
Since no asynchronous copy groups are ever committed via CP_ASYNC_COMMIT_GROUP, the instruction CP_ASYNC_WAIT_GROUP(0); at the end of the loop is functionally useless (dead code) and potentially misleading. I have removed it to improve code clarity.

2. Formatting Changes (Important)
You will notice a significant diff in this file. This is intentional.
I ran the project's pre-commit hooks before committing. The repository configures clang-format with --style=file via pre-commit, but there is no .clang-format file in the root directory. As a result, clang-format falls back to the default LLVM style, reformatting the entire file.
I have included these formatting changes to ensure this PR passes the project's automated CI/pre-commit checks.

🤝 Note

Since this is my first contribution, I wasn't entirely sure if committing the full file reformatting (triggered by the project's pre-commit hooks) is the standard practice here. If this large diff is not desired, please let me know, and I will be happy to adjust the PR accordingly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant