Skip to content

[Headers] Implement spirvamdgcnintrin.h #131164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JonChesterfield
Copy link
Collaborator

@JonChesterfield JonChesterfield commented Mar 13, 2025

Like a spirvintrin.h, except hard coded to only work on the spirv64-amd-amdhsa triple preferred by reviewers.

Does not use spirv intrinsics in the first instance as the builtins don't exist and nor do all the intrinsics one needs.

Hopefully over time we can slowly migrate this to use spirv intrinsics until eventually all the amdgcn specific stuff is gone and we can move it to spirvintrin.h. Until then, this should be functional and very similar to how hip has been implemented.

@JonChesterfield JonChesterfield requested a review from jhuber6 March 13, 2025 15:59
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Mar 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-amdgpu

Author: Jon Chesterfield (JonChesterfield)

Changes

Structure follows amdgcnintrin.h but with declarations where compiler intrinsics are not yet available.

Address space numbers, kernel attribute, checking how it interacts with openmp are left for later patches.


Full diff: https://github.com/llvm/llvm-project/pull/131164.diff

4 Files Affected:

  • (modified) clang/lib/Headers/amdgpuintrin.h (+1-1)
  • (modified) clang/lib/Headers/gpuintrin.h (+2)
  • (added) clang/lib/Headers/spirv64intrin.h (+131)
  • (modified) clang/test/Headers/gpuintrin.c (+223)
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index f7fb8e2814180..817cfeec896c4 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -1,4 +1,4 @@
-//===-- amdgpuintrin.h - AMDPGU intrinsic functions -----------------------===//
+//===-- amdgpuintrin.h - AMDGPU intrinsic functions -----------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h
index 0fb3916acac61..cf1cfd41a6788 100644
--- a/clang/lib/Headers/gpuintrin.h
+++ b/clang/lib/Headers/gpuintrin.h
@@ -60,6 +60,8 @@ _Pragma("omp end declare target");
 #include <nvptxintrin.h>
 #elif defined(__AMDGPU__)
 #include <amdgpuintrin.h>
+#elif defined(__SPIRV64__)
+#include <spirv64intrin.h>
 #elif !defined(_OPENMP)
 #error "This header is only meant to be used on GPU architectures."
 #endif
diff --git a/clang/lib/Headers/spirv64intrin.h b/clang/lib/Headers/spirv64intrin.h
new file mode 100644
index 0000000000000..2b9157544f170
--- /dev/null
+++ b/clang/lib/Headers/spirv64intrin.h
@@ -0,0 +1,131 @@
+//===-- spirv64intrin.h - SPIRV64 intrinsic functions --------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __SPIRV64INTRIN_H
+#define __SPIRV64INTRIN_H
+
+#ifndef __SPIRV64__
+#error "This file is intended for SPIRV64 targets or offloading to SPIRV64"
+#endif
+
+#ifndef __GPUINTRIN_H
+#error "Never use <spirv64intrin.h> directly; include <gpuintrin.h> instead"
+#endif
+
+// This is the skeleton of the spirv64 implementation for gpuintrin
+// Address spaces and kernel attribute are not yet implemented
+// The target-specific functions are declarations waiting for clang support
+
+#if defined(_OPENMP)
+#error "Openmp is not yet available on spirv64 though gpuintrin header"
+#endif
+
+// Type aliases to the address spaces used by the SPIRV backend.
+#define __gpu_private
+#define __gpu_constant
+#define __gpu_local
+#define __gpu_global
+#define __gpu_generic
+
+// Attribute to declare a function as a kernel.
+#define __gpu_kernel
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_FN_ATTRS uint32_t __gpu_num_blocks_x(void);
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_FN_ATTRS uint32_t __gpu_num_blocks_y(void);
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_FN_ATTRS uint32_t __gpu_num_blocks_z(void);
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_FN_ATTRS uint32_t __gpu_block_id_x(void);
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_FN_ATTRS uint32_t __gpu_block_id_y(void);
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_FN_ATTRS uint32_t __gpu_block_id_z(void);
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_FN_ATTRS uint32_t __gpu_num_threads_x(void);
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_FN_ATTRS uint32_t __gpu_num_threads_y(void);
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_FN_ATTRS uint32_t __gpu_num_threads_z(void);
+
+// Returns the 'x' dimension id of the workitem in the current workgroup.
+_DEFAULT_FN_ATTRS uint32_t __gpu_thread_id_x(void);
+
+// Returns the 'y' dimension id of the workitem in the current workgroup.
+_DEFAULT_FN_ATTRS uint32_t __gpu_thread_id_y(void);
+
+// Returns the 'z' dimension id of the workitem in the current workgroup.
+_DEFAULT_FN_ATTRS uint32_t __gpu_thread_id_z(void);
+
+// Returns the size of the wave.
+_DEFAULT_FN_ATTRS uint32_t __gpu_num_lanes(void);
+
+// Returns the id of the thread inside of a wave executing together.
+_DEFAULT_FN_ATTRS uint32_t __gpu_lane_id(void);
+
+// Returns the bit-mask of active threads in the current wave.
+_DEFAULT_FN_ATTRS uint64_t __gpu_lane_mask(void);
+
+// Copies the value from the first active thread in the wave to the rest.
+_DEFAULT_FN_ATTRS uint32_t __gpu_read_first_lane_u32(uint64_t __lane_mask,
+                                                     uint32_t __x);
+
+// Returns a bitmask of threads in the current lane for which \p x is true.
+_DEFAULT_FN_ATTRS uint64_t __gpu_ballot(uint64_t __lane_mask, bool __x);
+
+// Waits for all the threads in the block to converge and issues a fence.
+_DEFAULT_FN_ATTRS void __gpu_sync_threads(void);
+
+// Wait for all threads in the wave to converge
+_DEFAULT_FN_ATTRS void __gpu_sync_lane(uint64_t __lane_mask);
+
+// Shuffles the the lanes inside the wave according to the given index.
+_DEFAULT_FN_ATTRS uint32_t __gpu_shuffle_idx_u32(uint64_t __lane_mask,
+                                                 uint32_t __idx, uint32_t __x,
+                                                 uint32_t __width);
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  return __gpu_match_any_u32_impl(__lane_mask, __x);
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  return __gpu_match_any_u64_impl(__lane_mask, __x);
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
+  return __gpu_match_all_u32_impl(__lane_mask, __x);
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
+  return __gpu_match_all_u64_impl(__lane_mask, __x);
+}
+
+// Terminates execution of the associated wave.
+_DEFAULT_FN_ATTRS [[noreturn]] void __gpu_exit(void);
+
+// Suspend the thread briefly to assist the scheduler during busy loops.
+_DEFAULT_FN_ATTRS void __gpu_thread_suspend(void);
+
+#endif // __SPIRV64INTRIN_H
diff --git a/clang/test/Headers/gpuintrin.c b/clang/test/Headers/gpuintrin.c
index 9a15ce277ba87..eaf001be19ac9 100644
--- a/clang/test/Headers/gpuintrin.c
+++ b/clang/test/Headers/gpuintrin.c
@@ -9,6 +9,11 @@
 // RUN:   -target-feature +ptx62 \
 // RUN:   -triple nvptx64-nvidia-cuda -emit-llvm %s -o - \
 // RUN: | FileCheck %s --check-prefix=NVPTX
+//
+// RUN: %clang_cc1 -internal-isystem %S/Inputs/include  \
+// RUN:   -internal-isystem %S/../../lib/Headers/ \
+// RUN:   -triple spirv64-- -emit-llvm %s -o - \
+// RUN: | FileCheck %s --check-prefix=SPIRV64
 
 #include <gpuintrin.h>
 
@@ -978,6 +983,224 @@ __gpu_kernel void foo() {
 // NVPTX-NEXT:    call void @llvm.nvvm.exit()
 // NVPTX-NEXT:    ret void
 //
+//
+// SPIRV64-LABEL: define spir_func void @foo(
+// SPIRV64-SAME: ) #[[ATTR0:[0-9]+]] {
+// SPIRV64-NEXT:  [[ENTRY:.*:]]
+// SPIRV64-NEXT:    [[CALL:%.*]] = call spir_func i32 @__gpu_num_blocks_x()
+// SPIRV64-NEXT:    [[CALL1:%.*]] = call spir_func i32 @__gpu_num_blocks_y()
+// SPIRV64-NEXT:    [[CALL2:%.*]] = call spir_func i32 @__gpu_num_blocks_z()
+// SPIRV64-NEXT:    [[CALL3:%.*]] = call spir_func i32 @__gpu_num_blocks(i32 noundef 0)
+// SPIRV64-NEXT:    [[CALL4:%.*]] = call spir_func i32 @__gpu_block_id_x()
+// SPIRV64-NEXT:    [[CALL5:%.*]] = call spir_func i32 @__gpu_block_id_y()
+// SPIRV64-NEXT:    [[CALL6:%.*]] = call spir_func i32 @__gpu_block_id_z()
+// SPIRV64-NEXT:    [[CALL7:%.*]] = call spir_func i32 @__gpu_block_id(i32 noundef 0)
+// SPIRV64-NEXT:    [[CALL8:%.*]] = call spir_func i32 @__gpu_num_threads_x()
+// SPIRV64-NEXT:    [[CALL9:%.*]] = call spir_func i32 @__gpu_num_threads_y()
+// SPIRV64-NEXT:    [[CALL10:%.*]] = call spir_func i32 @__gpu_num_threads_z()
+// SPIRV64-NEXT:    [[CALL11:%.*]] = call spir_func i32 @__gpu_num_threads(i32 noundef 0)
+// SPIRV64-NEXT:    [[CALL12:%.*]] = call spir_func i32 @__gpu_thread_id_x()
+// SPIRV64-NEXT:    [[CALL13:%.*]] = call spir_func i32 @__gpu_thread_id_y()
+// SPIRV64-NEXT:    [[CALL14:%.*]] = call spir_func i32 @__gpu_thread_id_z()
+// SPIRV64-NEXT:    [[CALL15:%.*]] = call spir_func i32 @__gpu_thread_id(i32 noundef 0)
+// SPIRV64-NEXT:    [[CALL16:%.*]] = call spir_func i32 @__gpu_num_lanes()
+// SPIRV64-NEXT:    [[CALL17:%.*]] = call spir_func i32 @__gpu_lane_id()
+// SPIRV64-NEXT:    [[CALL18:%.*]] = call spir_func i64 @__gpu_lane_mask()
+// SPIRV64-NEXT:    [[CALL19:%.*]] = call spir_func i32 @__gpu_read_first_lane_u32(i64 noundef -1, i32 noundef -1)
+// SPIRV64-NEXT:    [[CALL20:%.*]] = call spir_func i64 @__gpu_read_first_lane_u64(i64 noundef -1, i64 noundef -1)
+// SPIRV64-NEXT:    [[CALL21:%.*]] = call spir_func i64 @__gpu_ballot(i64 noundef -1, i1 noundef zeroext true)
+// SPIRV64-NEXT:    call spir_func void @__gpu_sync_threads()
+// SPIRV64-NEXT:    call spir_func void @__gpu_sync_lane(i64 noundef -1)
+// SPIRV64-NEXT:    [[CALL22:%.*]] = call spir_func i32 @__gpu_shuffle_idx_u32(i64 noundef -1, i32 noundef -1, i32 noundef -1, i32 noundef 0)
+// SPIRV64-NEXT:    [[CALL23:%.*]] = call spir_func i64 @__gpu_first_lane_id(i64 noundef -1)
+// SPIRV64-NEXT:    [[CALL24:%.*]] = call spir_func zeroext i1 @__gpu_is_first_in_lane(i64 noundef -1)
+// SPIRV64-NEXT:    call spir_func void @__gpu_exit() #[[ATTR4:[0-9]+]]
+// SPIRV64-NEXT:    unreachable
+//
+//
+// SPIRV64-LABEL: define internal spir_func i32 @__gpu_num_blocks(
+// SPIRV64-SAME: i32 noundef [[__DIM:%.*]]) #[[ATTR0]] {
+// SPIRV64-NEXT:  [[ENTRY:.*:]]
+// SPIRV64-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    [[__DIM_ADDR:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    store i32 [[__DIM]], ptr [[__DIM_ADDR]], align 4
+// SPIRV64-NEXT:    [[TMP0:%.*]] = load i32, ptr [[__DIM_ADDR]], align 4
+// SPIRV64-NEXT:    switch i32 [[TMP0]], label %[[SW_DEFAULT:.*]] [
+// SPIRV64-NEXT:      i32 0, label %[[SW_BB:.*]]
+// SPIRV64-NEXT:      i32 1, label %[[SW_BB1:.*]]
+// SPIRV64-NEXT:      i32 2, label %[[SW_BB3:.*]]
+// SPIRV64-NEXT:    ]
+// SPIRV64:       [[SW_BB]]:
+// SPIRV64-NEXT:    [[CALL:%.*]] = call spir_func i32 @__gpu_num_blocks_x()
+// SPIRV64-NEXT:    store i32 [[CALL]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN:.*]]
+// SPIRV64:       [[SW_BB1]]:
+// SPIRV64-NEXT:    [[CALL2:%.*]] = call spir_func i32 @__gpu_num_blocks_y()
+// SPIRV64-NEXT:    store i32 [[CALL2]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN]]
+// SPIRV64:       [[SW_BB3]]:
+// SPIRV64-NEXT:    [[CALL4:%.*]] = call spir_func i32 @__gpu_num_blocks_z()
+// SPIRV64-NEXT:    store i32 [[CALL4]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN]]
+// SPIRV64:       [[SW_DEFAULT]]:
+// SPIRV64-NEXT:    unreachable
+// SPIRV64:       [[RETURN]]:
+// SPIRV64-NEXT:    [[TMP1:%.*]] = load i32, ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    ret i32 [[TMP1]]
+//
+//
+// SPIRV64-LABEL: define internal spir_func i32 @__gpu_block_id(
+// SPIRV64-SAME: i32 noundef [[__DIM:%.*]]) #[[ATTR0]] {
+// SPIRV64-NEXT:  [[ENTRY:.*:]]
+// SPIRV64-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    [[__DIM_ADDR:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    store i32 [[__DIM]], ptr [[__DIM_ADDR]], align 4
+// SPIRV64-NEXT:    [[TMP0:%.*]] = load i32, ptr [[__DIM_ADDR]], align 4
+// SPIRV64-NEXT:    switch i32 [[TMP0]], label %[[SW_DEFAULT:.*]] [
+// SPIRV64-NEXT:      i32 0, label %[[SW_BB:.*]]
+// SPIRV64-NEXT:      i32 1, label %[[SW_BB1:.*]]
+// SPIRV64-NEXT:      i32 2, label %[[SW_BB3:.*]]
+// SPIRV64-NEXT:    ]
+// SPIRV64:       [[SW_BB]]:
+// SPIRV64-NEXT:    [[CALL:%.*]] = call spir_func i32 @__gpu_block_id_x()
+// SPIRV64-NEXT:    store i32 [[CALL]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN:.*]]
+// SPIRV64:       [[SW_BB1]]:
+// SPIRV64-NEXT:    [[CALL2:%.*]] = call spir_func i32 @__gpu_block_id_y()
+// SPIRV64-NEXT:    store i32 [[CALL2]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN]]
+// SPIRV64:       [[SW_BB3]]:
+// SPIRV64-NEXT:    [[CALL4:%.*]] = call spir_func i32 @__gpu_block_id_z()
+// SPIRV64-NEXT:    store i32 [[CALL4]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN]]
+// SPIRV64:       [[SW_DEFAULT]]:
+// SPIRV64-NEXT:    unreachable
+// SPIRV64:       [[RETURN]]:
+// SPIRV64-NEXT:    [[TMP1:%.*]] = load i32, ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    ret i32 [[TMP1]]
+//
+//
+// SPIRV64-LABEL: define internal spir_func i32 @__gpu_num_threads(
+// SPIRV64-SAME: i32 noundef [[__DIM:%.*]]) #[[ATTR0]] {
+// SPIRV64-NEXT:  [[ENTRY:.*:]]
+// SPIRV64-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    [[__DIM_ADDR:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    store i32 [[__DIM]], ptr [[__DIM_ADDR]], align 4
+// SPIRV64-NEXT:    [[TMP0:%.*]] = load i32, ptr [[__DIM_ADDR]], align 4
+// SPIRV64-NEXT:    switch i32 [[TMP0]], label %[[SW_DEFAULT:.*]] [
+// SPIRV64-NEXT:      i32 0, label %[[SW_BB:.*]]
+// SPIRV64-NEXT:      i32 1, label %[[SW_BB1:.*]]
+// SPIRV64-NEXT:      i32 2, label %[[SW_BB3:.*]]
+// SPIRV64-NEXT:    ]
+// SPIRV64:       [[SW_BB]]:
+// SPIRV64-NEXT:    [[CALL:%.*]] = call spir_func i32 @__gpu_num_threads_x()
+// SPIRV64-NEXT:    store i32 [[CALL]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN:.*]]
+// SPIRV64:       [[SW_BB1]]:
+// SPIRV64-NEXT:    [[CALL2:%.*]] = call spir_func i32 @__gpu_num_threads_y()
+// SPIRV64-NEXT:    store i32 [[CALL2]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN]]
+// SPIRV64:       [[SW_BB3]]:
+// SPIRV64-NEXT:    [[CALL4:%.*]] = call spir_func i32 @__gpu_num_threads_z()
+// SPIRV64-NEXT:    store i32 [[CALL4]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN]]
+// SPIRV64:       [[SW_DEFAULT]]:
+// SPIRV64-NEXT:    unreachable
+// SPIRV64:       [[RETURN]]:
+// SPIRV64-NEXT:    [[TMP1:%.*]] = load i32, ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    ret i32 [[TMP1]]
+//
+//
+// SPIRV64-LABEL: define internal spir_func i32 @__gpu_thread_id(
+// SPIRV64-SAME: i32 noundef [[__DIM:%.*]]) #[[ATTR0]] {
+// SPIRV64-NEXT:  [[ENTRY:.*:]]
+// SPIRV64-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    [[__DIM_ADDR:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    store i32 [[__DIM]], ptr [[__DIM_ADDR]], align 4
+// SPIRV64-NEXT:    [[TMP0:%.*]] = load i32, ptr [[__DIM_ADDR]], align 4
+// SPIRV64-NEXT:    switch i32 [[TMP0]], label %[[SW_DEFAULT:.*]] [
+// SPIRV64-NEXT:      i32 0, label %[[SW_BB:.*]]
+// SPIRV64-NEXT:      i32 1, label %[[SW_BB1:.*]]
+// SPIRV64-NEXT:      i32 2, label %[[SW_BB3:.*]]
+// SPIRV64-NEXT:    ]
+// SPIRV64:       [[SW_BB]]:
+// SPIRV64-NEXT:    [[CALL:%.*]] = call spir_func i32 @__gpu_thread_id_x()
+// SPIRV64-NEXT:    store i32 [[CALL]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN:.*]]
+// SPIRV64:       [[SW_BB1]]:
+// SPIRV64-NEXT:    [[CALL2:%.*]] = call spir_func i32 @__gpu_thread_id_y()
+// SPIRV64-NEXT:    store i32 [[CALL2]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN]]
+// SPIRV64:       [[SW_BB3]]:
+// SPIRV64-NEXT:    [[CALL4:%.*]] = call spir_func i32 @__gpu_thread_id_z()
+// SPIRV64-NEXT:    store i32 [[CALL4]], ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    br label %[[RETURN]]
+// SPIRV64:       [[SW_DEFAULT]]:
+// SPIRV64-NEXT:    unreachable
+// SPIRV64:       [[RETURN]]:
+// SPIRV64-NEXT:    [[TMP1:%.*]] = load i32, ptr [[RETVAL]], align 4
+// SPIRV64-NEXT:    ret i32 [[TMP1]]
+//
+//
+// SPIRV64-LABEL: define internal spir_func i64 @__gpu_read_first_lane_u64(
+// SPIRV64-SAME: i64 noundef [[__LANE_MASK:%.*]], i64 noundef [[__X:%.*]]) #[[ATTR0]] {
+// SPIRV64-NEXT:  [[ENTRY:.*:]]
+// SPIRV64-NEXT:    [[__LANE_MASK_ADDR:%.*]] = alloca i64, align 8
+// SPIRV64-NEXT:    [[__X_ADDR:%.*]] = alloca i64, align 8
+// SPIRV64-NEXT:    [[__HI:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    [[__LO:%.*]] = alloca i32, align 4
+// SPIRV64-NEXT:    store i64 [[__LANE_MASK]], ptr [[__LANE_MASK_ADDR]], align 8
+// SPIRV64-NEXT:    store i64 [[__X]], ptr [[__X_ADDR]], align 8
+// SPIRV64-NEXT:    [[TMP0:%.*]] = load i64, ptr [[__X_ADDR]], align 8
+// SPIRV64-NEXT:    [[SHR:%.*]] = lshr i64 [[TMP0]], 32
+// SPIRV64-NEXT:    [[CONV:%.*]] = trunc i64 [[SHR]] to i32
+// SPIRV64-NEXT:    store i32 [[CONV]], ptr [[__HI]], align 4
+// SPIRV64-NEXT:    [[TMP1:%.*]] = load i64, ptr [[__X_ADDR]], align 8
+// SPIRV64-NEXT:    [[AND:%.*]] = and i64 [[TMP1]], 4294967295
+// SPIRV64-NEXT:    [[CONV1:%.*]] = trunc i64 [[AND]] to i32
+// SPIRV64-NEXT:    store i32 [[CONV1]], ptr [[__LO]], align 4
+// SPIRV64-NEXT:    [[TMP2:%.*]] = load i64, ptr [[__LANE_MASK_ADDR]], align 8
+// SPIRV64-NEXT:    [[TMP3:%.*]] = load i32, ptr [[__HI]], align 4
+// SPIRV64-NEXT:    [[CALL:%.*]] = call spir_func i32 @__gpu_read_first_lane_u32(i64 noundef [[TMP2]], i32 noundef [[TMP3]])
+// SPIRV64-NEXT:    [[CONV2:%.*]] = zext i32 [[CALL]] to i64
+// SPIRV64-NEXT:    [[SHL:%.*]] = shl i64 [[CONV2]], 32
+// SPIRV64-NEXT:    [[TMP4:%.*]] = load i64, ptr [[__LANE_MASK_ADDR]], align 8
+// SPIRV64-NEXT:    [[TMP5:%.*]] = load i32, ptr [[__LO]], align 4
+// SPIRV64-NEXT:    [[CALL3:%.*]] = call spir_func i32 @__gpu_read_first_lane_u32(i64 noundef [[TMP4]], i32 noundef [[TMP5]])
+// SPIRV64-NEXT:    [[CONV4:%.*]] = zext i32 [[CALL3]] to i64
+// SPIRV64-NEXT:    [[AND5:%.*]] = and i64 [[CONV4]], 4294967295
+// SPIRV64-NEXT:    [[OR:%.*]] = or i64 [[SHL]], [[AND5]]
+// SPIRV64-NEXT:    ret i64 [[OR]]
+//
+//
+// SPIRV64-LABEL: define internal spir_func i64 @__gpu_first_lane_id(
+// SPIRV64-SAME: i64 noundef [[__LANE_MASK:%.*]]) #[[ATTR0]] {
+// SPIRV64-NEXT:  [[ENTRY:.*:]]
+// SPIRV64-NEXT:    [[__LANE_MASK_ADDR:%.*]] = alloca i64, align 8
+// SPIRV64-NEXT:    store i64 [[__LANE_MASK]], ptr [[__LANE_MASK_ADDR]], align 8
+// SPIRV64-NEXT:    [[TMP0:%.*]] = load i64, ptr [[__LANE_MASK_ADDR]], align 8
+// SPIRV64-NEXT:    [[TMP1:%.*]] = call i64 @llvm.cttz.i64(i64 [[TMP0]], i1 true)
+// SPIRV64-NEXT:    [[TMP2:%.*]] = add i64 [[TMP1]], 1
+// SPIRV64-NEXT:    [[ISZERO:%.*]] = icmp eq i64 [[TMP0]], 0
+// SPIRV64-NEXT:    [[FFS:%.*]] = select i1 [[ISZERO]], i64 0, i64 [[TMP2]]
+// SPIRV64-NEXT:    [[CAST:%.*]] = trunc i64 [[FFS]] to i32
+// SPIRV64-NEXT:    [[SUB:%.*]] = sub nsw i32 [[CAST]], 1
+// SPIRV64-NEXT:    [[CONV:%.*]] = sext i32 [[SUB]] to i64
+// SPIRV64-NEXT:    ret i64 [[CONV]]
+//
+//
+// SPIRV64-LABEL: define internal spir_func zeroext i1 @__gpu_is_first_in_lane(
+// SPIRV64-SAME: i64 noundef [[__LANE_MASK:%.*]]) #[[ATTR0]] {
+// SPIRV64-NEXT:  [[ENTRY:.*:]]
+// SPIRV64-NEXT:    [[__LANE_MASK_ADDR:%.*]] = alloca i64, align 8
+// SPIRV64-NEXT:    store i64 [[__LANE_MASK]], ptr [[__LANE_MASK_ADDR]], align 8
+// SPIRV64-NEXT:    [[CALL:%.*]] = call spir_func i32 @__gpu_lane_id()
+// SPIRV64-NEXT:    [[CONV:%.*]] = zext i32 [[CALL]] to i64
+// SPIRV64-NEXT:    [[TMP0:%.*]] = load i64, ptr [[__LANE_MASK_ADDR]], align 8
+// SPIRV64-NEXT:    [[CALL1:%.*]] = call spir_func i64 @__gpu_first_lane_id(i64 noundef [[TMP0]])
+// SPIRV64-NEXT:    [[CMP:%.*]] = icmp eq i64 [[CONV]], [[CALL1]]
+// SPIRV64-NEXT:    ret i1 [[CMP]]
+//
 //.
 // AMDGPU: [[RNG3]] = !{i32 1, i32 0}
 // AMDGPU: [[META4]] = !{}

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably just be called spirvintrin.h unless there's something that significantly distinguishes this from the 32-bit counterpart. (Same reason it's nvptxintrin and not nvptx64intrin.

@JonChesterfield
Copy link
Collaborator Author

Clang raises a lot of exciting errors for spirv-- about vulcan environments and I don't really know the distinction between the two - if 32 bit spirv turns out to be a workable thing it should go down the same code path, with ifdef SPIRV || SPIRV64 and a file rename. I don't know enough about the ecosystem to judge at this point.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 13, 2025

Clang raises a lot of exciting errors for spirv-- about vulcan environments and I don't really know the distinction between the two - if 32 bit spirv turns out to be a workable thing it should go down the same code path, with ifdef SPIRV || SPIRV64 and a file rename. I don't know enough about the ecosystem to judge at this point.

Name should be spirvintrin.h but if we can't support 32 for now just error in the preprocessor.

@JonChesterfield
Copy link
Collaborator Author

Name should be spirvintrin.h but if we can't support 32 for now just error in the preprocessor.

Yep, you're right. It'll be caught by only checking for the SPIRV64 macro, but nothing in this file is 32 vs 64 bit dependent as that's part of what gpuintrin.h gives us. Renamed.

@JonChesterfield JonChesterfield changed the title [Headers] Create stub spirv64intrin.h [Headers] Create stub spirvintrin.h Mar 13, 2025
@JonChesterfield
Copy link
Collaborator Author

Effectively a subset of #131190, I'd still like to land this and rebase 131190 on the grounds of signal to noise ratio.

Copy link
Member

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@JonChesterfield
Copy link
Collaborator Author

@jhuber6 I think we should have this despite the rejected #131190.

Maybe we'll get some clang builtins for spirv. Otherwise some things can be done with the asm label hack. Some can just be left as nop, e.g. a suspend that does nothing. Regardless, this is the structure we want for the file, and it gives as a place to add functions incrementally as the corresponding reviews work themselves out.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 17, 2025

We should probably wait until we have builtins, since there's no utility in just having stubs right now.

@JonChesterfield
Copy link
Collaborator Author

The utility is having a place to fill things in incrementally as we get them working, and thus use libc to drive the implementation of enough of spirv to get things working. If you decline to have any spirv code until everything is working I'll have to do the testing somewhere else.

@JonChesterfield
Copy link
Collaborator Author

OK, we can't have spirv64-unknown-unknown at this point. But we could get this compiling using a mixture of spirv intrinsics (where they exist) and amdgpu intrinsics for spirv64-amd-amdhsa, leaving a preprocessor error for other targets. I'll take a stab at that.

@JonChesterfield
Copy link
Collaborator Author

Well it's not pretty, but spirv64-amd-amdhsa sets both AMDGPU and SPIRV64 macros. Added a commit with an example that dispatches to amdgpu intrinsics on the happy path and preprocessor error otherwise.

If you let that get to the spirv backend it falls over with Intrinsic selection not implemented but you can have IR with the right amdgpu intrinsics in it.

Copy link

github-actions bot commented Mar 17, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff c9d7f707c101896294d2f4ccbf26329894295dc9 402a091ac6eac8a50ce54a519acce5bfa4de1c88 --extensions h,c -- clang/lib/Headers/spirvamdgpuintrin.h clang/lib/Headers/amdgpuintrin.h clang/lib/Headers/gpuintrin.h clang/test/Headers/gpuintrin.c
View the diff from clang-format here.
diff --git a/clang/lib/Headers/spirvamdgpuintrin.h b/clang/lib/Headers/spirvamdgpuintrin.h
index 1d123d3965..7337f18320 100644
--- a/clang/lib/Headers/spirvamdgpuintrin.h
+++ b/clang/lib/Headers/spirvamdgpuintrin.h
@@ -9,7 +9,7 @@
 #ifndef __SPIRVAMDGPUINTRIN_H
 #define __SPIRVAMDGPUINTRIN_H
 
-#if !defined( __SPIRV64__) || !defined(__AMDGPU__)
+#if !defined(__SPIRV64__) || !defined(__AMDGPU__)
 #error "This file is intended for the spirv64-amd-amdhsa target"
 #endif
 

@JonChesterfield JonChesterfield force-pushed the jc_header_spirv_stub branch 2 times, most recently from d7eefa7 to 092024b Compare March 18, 2025 15:57
@JonChesterfield JonChesterfield changed the title [Headers] Create stub spirvintrin.h [Headers] Implement spirvamdgcnintrin.h Mar 18, 2025
@JonChesterfield
Copy link
Collaborator Author

JonChesterfield commented Mar 18, 2025

@sarnex I'm deeply sorry about this sequence of events. The single spirv64 header that lowered to intrinsics that amdgpu or intel map onto their own world would have removed a swathe of spurious variation.

What we're going to have to do in the interim is have spirvamdgcnintrin.h and sprivintelintrin.h, each of which slowly pick up spirv intrinsics and in the meantime use amdgpu or intel ones, and eventually maybe we'll be able to collapse them both into the same sprivintrin.h.

@sarnex
Copy link
Member

sarnex commented Mar 18, 2025

No problem, as long as we have a somewhat reasonable way to eventually get the DeviceRTL working I'm happy, thanks for working on this.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 18, 2025

@sarnex I'm assuming you're somewhat familiar with SPIR-V and the backend. We really need a way to call SPIR-V builtins directly, i.e. some way to get clang to lower something like __builtin_spirv_ballot() to OpGroupNonUniformBallot. I don't know how much SPIR-V exposes for actual 'subgroup' level operations.

@sarnex
Copy link
Member

sarnex commented Mar 18, 2025

@sarnex I'm assuming you're somewhat familiar with SPIR-V and the backend. We really need a way to call SPIR-V builtins directly, i.e. some way to get clang to lower something like __builtin_spirv_ballot() to OpGroupNonUniformBallot. I don't know how much SPIR-V exposes for actual 'subgroup' level operations.

I haven't worked on the backend itself, @michalpaszkowski @VyacheslavLevytskyy any ideas on what we can do?

@michalpaszkowski michalpaszkowski self-requested a review March 18, 2025 18:35
@JonChesterfield
Copy link
Collaborator Author

We don't need a way to call the builtins. See for example this pull request. Intel have done the spirv64-intel- thing similar to the spirv64-amd-amdhsa so they can (presumably?) use whatever intrinsics they like, using another header quite like this one.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all this does is call the amdgpu builtins, is it possible to just include that one? I think the only thing that differs is the address spaces so we could probably just undef those.

@JonChesterfield
Copy link
Collaborator Author

JonChesterfield commented Mar 18, 2025

Feel free to rehash this however you see fit. I don't see a way to make it pretty so embracing straightforward seems fine. If we do a clever including the same header twice or other preprocessor hackery it'll make changing any of these over to spirv intrinsics a nuisance.

At least the any/all style functions were factored out so it's only really the core implementation that gets duplicated. The bit that the compiler should take care of, but won't at present.

Expressed differently, this is semantically ugly so it should be syntactically ugly too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants