Skip to content

[Coroutines] Change llvm.coro.noop to accept llvm_anyptr_ty instead #102096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Aug 6, 2024

Summary:
These intrinsics return pointers to global values, which currently does not respect the target's default address space. This patch does the first step by modifying the llvm.coro.noop intrinsic to accept different address spaces. Porting the others should go into different patches.

Fixes: #102095

@llvmbot
Copy link
Member

llvmbot commented Aug 6, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-coroutines

@llvm/pr-subscribers-llvm-transforms

Author: Joseph Huber (jhuber6)

Changes

Summary:
This pass tries to get a pointer from the global. Some targets, like
AMDGPU, emit globals to a qualified address space. This would result in
an invalid cast. To fix this, we simply apply an addrspace cast, which
is a no-op if none exit.

Fixes: #102095


Full diff: https://github.com/llvm/llvm-project/pull/102096.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Coroutines/CoroEarly.cpp (+4-2)
  • (modified) llvm/test/Transforms/Coroutines/coro-noop.ll (+4-1)
diff --git a/llvm/lib/Transforms/Coroutines/CoroEarly.cpp b/llvm/lib/Transforms/Coroutines/CoroEarly.cpp
index d8e827e9cebcf..53774fd5a35d5 100644
--- a/llvm/lib/Transforms/Coroutines/CoroEarly.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroEarly.cpp
@@ -121,8 +121,8 @@ static void buildDebugInfoForNoopResumeDestroyFunc(Function *NoopFn) {
 }
 
 void Lowerer::lowerCoroNoop(IntrinsicInst *II) {
+  LLVMContext &C = Builder.getContext();
   if (!NoopCoro) {
-    LLVMContext &C = Builder.getContext();
     Module &M = *II->getModule();
 
     // Create a noop.frame struct type.
@@ -152,7 +152,9 @@ void Lowerer::lowerCoroNoop(IntrinsicInst *II) {
   }
 
   Builder.SetInsertPoint(II);
-  auto *NoopCoroVoidPtr = Builder.CreateBitCast(NoopCoro, Int8Ptr);
+  auto *NoopCoroVoidPtr = Builder.CreateBitCast(
+      Builder.CreateAddrSpaceCast(NoopCoro, PointerType::getUnqual(C)),
+      PointerType::getUnqual(C));
   II->replaceAllUsesWith(NoopCoroVoidPtr);
   II->eraseFromParent();
 }
diff --git a/llvm/test/Transforms/Coroutines/coro-noop.ll b/llvm/test/Transforms/Coroutines/coro-noop.ll
index 1e4f19a2ef66e..cc2aa1d974355 100644
--- a/llvm/test/Transforms/Coroutines/coro-noop.ll
+++ b/llvm/test/Transforms/Coroutines/coro-noop.ll
@@ -1,8 +1,10 @@
 ; Tests that CoroEarly pass correctly lowers coro.noop
-; RUN: opt < %s -S -passes=coro-early | FileCheck %s
+; RUN: opt -S -passes=coro-early < %s | FileCheck %s
+; RUN: opt -mtriple=amdgcn-- -S -passes=coro-early < %s | FileCheck %s --check-prefix=AMDGPU
 
 ; CHECK: %NoopCoro.Frame = type { ptr, ptr }
 ; CHECK: @NoopCoro.Frame.Const = private constant %NoopCoro.Frame { ptr @__NoopCoro_ResumeDestroy, ptr @__NoopCoro_ResumeDestroy }
+; AMDGPU: @NoopCoro.Frame.Const = private addrspace(1) constant %NoopCoro.Frame { ptr @__NoopCoro_ResumeDestroy, ptr @__NoopCoro_ResumeDestroy }
 
 
 ; CHECK-LABEL: @noop(
@@ -10,6 +12,7 @@ define ptr @noop() {
 ; CHECK-NEXT: entry
 entry:
 ; CHECK-NEXT: ret ptr @NoopCoro.Frame.Const
+; AMDGPU: ret ptr addrspacecast (ptr addrspace(1) @NoopCoro.Frame.Const to ptr)
   %n = call ptr @llvm.coro.noop()
   ret ptr %n
 }

@jrtc27
Copy link
Collaborator

jrtc27 commented Aug 6, 2024

I imagine the better fix would be to either overload the intrinsic or make it return a pointer in the global address space, depending on whether its semantics are tied to globals or this is just an implementation choice.

@@ -152,7 +152,8 @@ void Lowerer::lowerCoroNoop(IntrinsicInst *II) {
}

Builder.SetInsertPoint(II);
auto *NoopCoroVoidPtr = Builder.CreateBitCast(NoopCoro, Int8Ptr);
auto *NoopCoroVoidPtr =
Builder.CreateAddrSpaceCast(NoopCoro, PointerType::getUnqual(C));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't insert any addrspacecast you want. Really these intrinsics should be fixed to have an address space mangled pointer. We should not have any intrinsics defined with llvm_ptr_ty

@arsenm
Copy link
Contributor

arsenm commented Aug 6, 2024

Don't describe it as address space qualified. There is no such thing as a value with no address space

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:ir labels Aug 6, 2024
@jhuber6 jhuber6 changed the title [Transforms] Fix Coroutine transform on non-default addressspaces [Coroutines] Change llvm.coro.noop to accept llvm_anyptr_ty instead Aug 6, 2024
Summary:
This pass tries to get a pointer from the global. Some targets, like
AMDGPU, emit globals to a qualified address space. This would result in
an invalid cast. To fix this, we simply apply an addrspace cast, which
is a no-op if none exit.
@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 6, 2024

I imagine the better fix would be to either overload the intrinsic or make it return a pointer in the global address space, depending on whether it's semantics tied to globals or this is just an implementation choice.

It's definitely tied to globals for noop in this case, so I did what you suggested. I think we should apply this to a lot of the other ones, but they should be separate patches and this solves my immediate problem.

bool coro::declaresIntrinsics(const Module &M,
const DenseSet<Intrinsic::ID> &Identifiers) {
for (const Function &F : M.functions()) {
if (Identifiers.contains(F.getIntrinsicID()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just make this a switch over the intrinsic ID instead of this weird intermediate set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how the code was originally put together, since it seems to be a subset of coroutine functions that are used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right so switch over that set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that's what this function does, takes a subset and checks the module for any matching intrinsic IDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's doing it with this intermediate set. As in, this is an overly fancy, heavy API. I mean literally loop over function declarations with a switch in it, no DenseSet

@jhuber6 jhuber6 closed this Sep 23, 2024
@jhuber6 jhuber6 deleted the coro branch September 23, 2024 13:26
@jhuber6 jhuber6 restored the coro branch September 23, 2024 13:26
@jhuber6 jhuber6 reopened this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category coroutines C++20 coroutines llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Clang][AMDGPU] Clang crashes on __builtin_coro_noop when targeting AMDGPU
4 participants