-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[clang] Avoid accounting for tail padding in kernel arguments #156229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -327,9 +327,10 @@ void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF, | |
| /// (void*, short, void*) is passed as {void **, short *, void **} to the launch | ||
| /// function. For the LLVM/offload launch we flatten the arguments into the | ||
| /// struct directly. In addition, we include the size of the arguments, thus | ||
| /// pass {sizeof({void *, short, void *}), ptr to {void *, short, void *}, | ||
| /// nullptr}. The last nullptr needs to be initialized to an array of pointers | ||
| /// pointing to the arguments if we want to offload to the host. | ||
| /// pass {size of ({void *, short, void *}) without tail padding, ptr to {void | ||
| /// *, short, void *}, nullptr}. The last nullptr needs to be initialized to an | ||
| /// array of pointers pointing to the arguments if we want to offload to the | ||
| /// host. | ||
| Address CGNVCUDARuntime::prepareKernelArgsLLVMOffload(CodeGenFunction &CGF, | ||
| FunctionArgList &Args) { | ||
| SmallVector<llvm::Type *> ArgTypes, KernelLaunchParamsTypes; | ||
|
|
@@ -350,7 +351,15 @@ Address CGNVCUDARuntime::prepareKernelArgsLLVMOffload(CodeGenFunction &CGF, | |
| KernelLaunchParamsTy, CharUnits::fromQuantity(16), | ||
| "kernel_launch_params"); | ||
|
|
||
| auto KernelArgsSize = CGM.getDataLayout().getTypeAllocSize(KernelArgsTy); | ||
| // Avoid accounting the tail padding for the kernel arguments. | ||
| auto KernelArgsSize = llvm::TypeSize::getZero(); | ||
| if (auto N = KernelArgsTy->getNumElements()) { | ||
| auto *SL = CGM.getDataLayout().getStructLayout(KernelArgsTy); | ||
| KernelArgsSize += SL->getElementOffset(N - 1); | ||
| KernelArgsSize += CGM.getDataLayout().getTypeAllocSize( | ||
| KernelArgsTy->getElementType(N - 1)); | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there is a cleaner way of getting this information, please let me know. |
||
| } | ||
|
|
||
| CGF.Builder.CreateStore(llvm::ConstantInt::get(Int64Ty, KernelArgsSize), | ||
| CGF.Builder.CreateStructGEP(KernelLaunchParams, 0)); | ||
| CGF.Builder.CreateStore(KernelArgs.emitRawPointer(CGF), | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3658,11 +3658,6 @@ Error AMDGPUKernelTy::launchImpl(GenericDeviceTy &GenericDevice, | |
| KernelArgsTy &KernelArgs, | ||
| KernelLaunchParamsTy LaunchParams, | ||
| AsyncInfoWrapperTy &AsyncInfoWrapper) const { | ||
| if (ArgsSize != LaunchParams.Size && | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is concerning, the argument size here is directly returned by the runtime. If this is false then I'd assume the ABI is broken unless I'm missing something.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first condition will be true if there are implicit kernel arguments. The second condition can't be checked anymore because the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could compute the padded size (to a pointer alignment, which is the alignment of implicit args) and keep the check. But I'm not sure if it's worthy.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The implicit arguments are placed directly after what HSA reports as the argument size, does this change that?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It shouldn't change that. There is the |
||
| ArgsSize > LaunchParams.Size + getImplicitArgsSize()) | ||
| return Plugin::error(ErrorCode::INVALID_ARGUMENT, | ||
| "invalid kernel arguments size"); | ||
|
|
||
| AMDGPUPluginTy &AMDGPUPlugin = | ||
| static_cast<AMDGPUPluginTy &>(GenericDevice.Plugin); | ||
| AMDHostDeviceTy &HostDevice = AMDGPUPlugin.getHostDevice(); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little iffy on this, what's the reason that we can't just mimic CUDA's handling? IIUC this is the old version where we passed these things through the
liboimptargetinterface. Wasn't there some effort to make a different one on top of libcudart?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with mimicking "CUDA's handling"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this "Offload on LLVM" thing is using the OpenMP API in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the title of the PR should be more generic. I don't think this issue is only affecting LLVM offloading. It could potentially affect any kernel in OpenMP offloading with parameters that have different alignments.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That never occurs in OpenMP because we pad everything to u64, but I get what you mean. I need to look deeper into how argument passing works, hopefully make some improvements in offload and pull some more things into OpenMP so we can make the argument handling generic and leave the OpenMP specific assumptions in
libomptarget.