Skip to content

[clang][ARM][AArch64] Define intrinsics guarded by __has_builtin on all platforms #128222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 21, 2025

Conversation

sarnex
Copy link
Member

@sarnex sarnex commented Feb 21, 2025

Instead of defining ARM ACLE intrinsics only on MSVC and guarding wrapper functions in headers with __has_builtin, universally define the intrinsics as target header builtins.

@sarnex sarnex marked this pull request as ready for review February 21, 2025 21:34
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Feb 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 21, 2025

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-backend-x86

Author: Nick Sarnie (sarnex)

Changes

Apply a preemptive workaround for a possible behavior change in __has_builtin as suggested in #121839 and #126324.


Full diff: https://github.com/llvm/llvm-project/pull/128222.diff

1 Files Affected:

  • (modified) clang/lib/Headers/arm_acle.h (+3)
diff --git a/clang/lib/Headers/arm_acle.h b/clang/lib/Headers/arm_acle.h
index b1dc90f84ad36..fbd0020258722 100644
--- a/clang/lib/Headers/arm_acle.h
+++ b/clang/lib/Headers/arm_acle.h
@@ -27,6 +27,8 @@
 extern "C" {
 #endif
 
+#if !defined(__CUDA__)
+
 /* 7 SYNCHRONIZATION, BARRIER AND HINT INTRINSICS */
 /* 7.3 Memory barriers */
 #if !__has_builtin(__dmb)
@@ -70,6 +72,7 @@ static __inline__ void __attribute__((__always_inline__, __nodebug__)) __yield(v
   __builtin_arm_yield();
 }
 #endif
+#endif // #if !defined(__CUDA__)
 
 #if defined(__ARM_32BIT_STATE) && __ARM_32BIT_STATE
 #define __dbg(t) __builtin_arm_dbg(t)

@llvmbot
Copy link
Member

llvmbot commented Feb 21, 2025

@llvm/pr-subscribers-clang

Author: Nick Sarnie (sarnex)

Changes

Apply a preemptive workaround for a possible behavior change in __has_builtin as suggested in #121839 and #126324.


Full diff: https://github.com/llvm/llvm-project/pull/128222.diff

1 Files Affected:

  • (modified) clang/lib/Headers/arm_acle.h (+3)
diff --git a/clang/lib/Headers/arm_acle.h b/clang/lib/Headers/arm_acle.h
index b1dc90f84ad36..fbd0020258722 100644
--- a/clang/lib/Headers/arm_acle.h
+++ b/clang/lib/Headers/arm_acle.h
@@ -27,6 +27,8 @@
 extern "C" {
 #endif
 
+#if !defined(__CUDA__)
+
 /* 7 SYNCHRONIZATION, BARRIER AND HINT INTRINSICS */
 /* 7.3 Memory barriers */
 #if !__has_builtin(__dmb)
@@ -70,6 +72,7 @@ static __inline__ void __attribute__((__always_inline__, __nodebug__)) __yield(v
   __builtin_arm_yield();
 }
 #endif
+#endif // #if !defined(__CUDA__)
 
 #if defined(__ARM_32BIT_STATE) && __ARM_32BIT_STATE
 #define __dbg(t) __builtin_arm_dbg(t)

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment for why we need to check for the CUDA arch, but I'll defer to @Artem-B for the final +1.

Comment on lines 30 to 31
#if !defined(__CUDA_ARCH__)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this will be observable to any code that happens to need those builtins, and they are actually not provided by the compiler (i.e. __has_builtin(__wfi) would be host on the host).

In that case, any use of __wfi() in the host code will break compilation because host-side code will not be able to find it.

I think a more robust fix would be to provide only declarations for these maybe-builtin functions, so the compiler always sees the correct signature of the functions that are conditionally provided depending on __has_builtin().

Copy link
Member Author

@sarnex sarnex Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we unconditionally declare the functions, regardless of compilation mode/host/device/__has_builtin result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if __CUDA_ARCH__
// Make sure the declarations here are in sync with the functions provided under #else.
<declarations>
#else
<existing code>
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually wondering if the header needs __has_builtin()... They are coming from the same compiler, and, if they are missing, they are declared in terms of __builtin*() functions.

Looking at their definitions, it appears that those builtins are defined when we're targeting windows:

LANGBUILTIN(__yield, "v", "", ALL_MS_LANGUAGES)
LANGBUILTIN(__wfe, "v", "", ALL_MS_LANGUAGES)
LANGBUILTIN(__wfi, "v", "", ALL_MS_LANGUAGES)
LANGBUILTIN(__sev, "v", "", ALL_MS_LANGUAGES)
LANGBUILTIN(__sevl, "v", "", ALL_MS_LANGUAGES)
LANGBUILTIN(__dmb, "vUi", "nc", ALL_MS_LANGUAGES)
LANGBUILTIN(__dsb, "vUi", "nc", ALL_MS_LANGUAGES)
LANGBUILTIN(__isb, "vUi", "nc", ALL_MS_LANGUAGES)

If we have to define them on non-windows targets anyways, perhaps a better fix is to let clang define them everywhere, and avoid this special case in the headers.

@compnerd -- do you recall what's the story with these non-builtin builtins here?
They were added specifically for MS targets in 4bddd9d

But five years later, @Bigcheese added these wrappers in the header:
16af23f

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@compnerd @Bigcheese Do you mind taking a look at Artem's questions above? Thx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnk Reid, would you happen to have an idea what's up with these builtins on Windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnk Ping on this, thx!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to define them on non-windows targets anyways, perhaps a better fix is to let clang define them everywhere, and avoid this special case in the headers.

Yes, we should go this direction. There are some examples of this for x86 builtins. Here's an example of one I did recently for x86:
ee92122

In this example, we leave behind an _m_prefetchw declaration, and the builtin is defined as a "library builtin" which has an associated header. This forces callers to include the relevant header, in this case arm_acle.h, to make the builtin callable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, ill try to implement this assuming it's relatively easy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just pushed a commit which attempts to implement this, please take a look, thanks!

@llvmbot llvmbot added backend:ARM backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 17, 2025
@sarnex sarnex changed the title [Clang][ARM] Only try to redefine builtins for non-CUDA [clang][ARM][AArch64] Define intrinsics guarded by __has_builtin on all platforms Apr 17, 2025
@sarnex sarnex requested review from jhuber6, Artem-B and rnk April 17, 2025 19:42
Copy link

github-actions bot commented Apr 17, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@@ -36,6 +36,28 @@ typedef __SIZE_TYPE__ size_t;

#include <intrin.h>

#ifdef __ARM_ACLE
// arm_acle.h needs some stdint types, but -ffreestanding prevents us from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be fixed in arm_acle.h itself so it includes the headers with the types it needs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arm_acle.h already includes the correct header, stdint.h, the problem here is this test uses a custom stdint.h in the Inputs/include directory that doesn't have the required types, a better fix is to just add this code to that custom stdint.h, I'll do that in the next commit, thanks.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex requested a review from Artem-B April 18, 2025 15:43
Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this is a good direction.

@sarnex sarnex merged commit de0153d into llvm:main Apr 21, 2025
11 checks passed
@mstorsjo
Copy link
Member

This breaks real-world code with MSVC headers.

Testcase:

#include <windows.h>
#include <winnt.h>
void func(void) { _InstructionSynchronizationBarrier(); }
$ bin/clang-cl --target=aarch64-windows-msvc -c isb.c
isb.c(3,19): error: call to undeclared library function '__isb' with type 'void (unsigned int)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    3 | void func(void) { _InstructionSynchronizationBarrier(); }
      |                   ^
/home/martin/msvc2022-17.13/kits/10/include/10.0.22621.0/um/winnt.h(5717,46): note: expanded from macro '_InstructionSynchronizationBarrier'
 5717 | #define _InstructionSynchronizationBarrier() __isb(_ARM64_BARRIER_SY)
      |                                              ^
isb.c(3,19): note: include the header <arm_acle.h> or explicitly provide a declaration for '__isb'
/home/martin/msvc2022-17.13/kits/10/include/10.0.22621.0/um/winnt.h(5717,46): note: expanded from macro '_InstructionSynchronizationBarrier'
 5717 | #define _InstructionSynchronizationBarrier() __isb(_ARM64_BARRIER_SY)
      |                                              ^
1 error generated.

@sarnex
Copy link
Member Author

sarnex commented Apr 22, 2025

Investigating, thanks

@sarnex
Copy link
Member Author

sarnex commented Apr 22, 2025

Fix #136742

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 backend:ARM backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants