Skip to content

Implement operand bundles for floating-point operations #109798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

spavloff
Copy link
Collaborator

Currently floating-point operations in general form (beyond the default mode) are always represented by calls to constrained intrinsics. In addition to the side effect, they carry additional information in the form of metadata arguments. This scheme is not efficient in the case of intrinsic function calls, as was noted in
https://discourse.llvm.org/t/thought-on-strictfp-support/71453, because it requires defining a separate intrinsic for the same operation but used in non-default FP environment. The solution proposed in the discussion was "to move the complexity about the environment tracking from the intrinsics themselves to the call instruction".

The way implemented in this change is to use operand bundles (https://llvm.org/docs/LangRef.html#operand-bundles). This way was tried previously (https://reviews.llvm.org/D93455), but was not finished.

This change does not add any new functionality, it only adds the new way of keeping FP related information in LLVM IR. Metadata arguments of constrained functions are preserved, but they are not used in the queries like getRoundingMode or getExceptionBehavior.

@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2024

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-adt

@llvm/pr-subscribers-backend-amdgpu

Author: Serge Pavlov (spavloff)

Changes

Currently floating-point operations in general form (beyond the default mode) are always represented by calls to constrained intrinsics. In addition to the side effect, they carry additional information in the form of metadata arguments. This scheme is not efficient in the case of intrinsic function calls, as was noted in
https://discourse.llvm.org/t/thought-on-strictfp-support/71453, because it requires defining a separate intrinsic for the same operation but used in non-default FP environment. The solution proposed in the discussion was "to move the complexity about the environment tracking from the intrinsics themselves to the call instruction".

The way implemented in this change is to use operand bundles (https://llvm.org/docs/LangRef.html#operand-bundles). This way was tried previously (https://reviews.llvm.org/D93455), but was not finished.

This change does not add any new functionality, it only adds the new way of keeping FP related information in LLVM IR. Metadata arguments of constrained functions are preserved, but they are not used in the queries like getRoundingMode or getExceptionBehavior.


Patch is 47.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109798.diff

22 Files Affected:

  • (modified) clang/test/CodeGen/X86/strictfp_builtins.c (+3-3)
  • (modified) clang/test/CodeGen/strictfp_builtins.c (+17-17)
  • (modified) clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl (+2-2)
  • (modified) llvm/docs/LangRef.rst (+23)
  • (modified) llvm/include/llvm/ADT/FloatingPointMode.h (+9)
  • (modified) llvm/include/llvm/IR/AutoUpgrade.h (+2)
  • (modified) llvm/include/llvm/IR/FPEnv.h (+9)
  • (modified) llvm/include/llvm/IR/IRBuilder.h (+48)
  • (modified) llvm/include/llvm/IR/InstrTypes.h (+7)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (-2)
  • (modified) llvm/include/llvm/IR/LLVMContext.h (+2)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+46)
  • (modified) llvm/lib/IR/AutoUpgrade.cpp (+78)
  • (modified) llvm/lib/IR/IRBuilder.cpp (+44-12)
  • (modified) llvm/lib/IR/Instructions.cpp (+18)
  • (modified) llvm/lib/IR/IntrinsicInst.cpp (-23)
  • (modified) llvm/lib/IR/LLVMContext.cpp (+10)
  • (modified) llvm/lib/IR/Verifier.cpp (+69-1)
  • (modified) llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp (+6-4)
  • (modified) llvm/lib/Transforms/Utils/CloneFunction.cpp (+13-3)
  • (modified) llvm/test/Bitcode/operand-bundles-bc-analyzer.ll (+2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll (+5-5)
diff --git a/clang/test/CodeGen/X86/strictfp_builtins.c b/clang/test/CodeGen/X86/strictfp_builtins.c
index 43e4060bef259b..75ed3a2555b3d7 100644
--- a/clang/test/CodeGen/X86/strictfp_builtins.c
+++ b/clang/test/CodeGen/X86/strictfp_builtins.c
@@ -27,7 +27,7 @@ void p(char *str, int x) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516) #[[ATTR4:[0-9]+]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -43,7 +43,7 @@ void test_long_double_isinf(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504) #[[ATTR4]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -59,7 +59,7 @@ void test_long_double_isfinite(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3) #[[ATTR4]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
diff --git a/clang/test/CodeGen/strictfp_builtins.c b/clang/test/CodeGen/strictfp_builtins.c
index 58815c7de4fa94..2e758115779711 100644
--- a/clang/test/CodeGen/strictfp_builtins.c
+++ b/clang/test/CodeGen/strictfp_builtins.c
@@ -31,21 +31,21 @@ void p(char *str, int x) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[ISZERO:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[ISZERO:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR5:[0-9]+]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    br i1 [[ISZERO]], label [[FPCLASSIFY_END:%.*]], label [[FPCLASSIFY_NOT_ZERO:%.*]]
 // CHECK:       fpclassify_end:
 // CHECK-NEXT:    [[FPCLASSIFY_RESULT:%.*]] = phi i32 [ 4, [[ENTRY:%.*]] ], [ 0, [[FPCLASSIFY_NOT_ZERO]] ], [ 1, [[FPCLASSIFY_NOT_NAN:%.*]] ], [ [[TMP2:%.*]], [[FPCLASSIFY_NOT_INF:%.*]] ]
 // CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[FPCLASSIFY_RESULT]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
 // CHECK:       fpclassify_not_zero:
-// CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP0]], metadata !"uno", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP0]], metadata !"uno", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    br i1 [[CMP]], label [[FPCLASSIFY_END]], label [[FPCLASSIFY_NOT_NAN]]
 // CHECK:       fpclassify_not_nan:
-// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR5:[0-9]+]]
-// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR6:[0-9]+]]
+// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    br i1 [[ISINF]], label [[FPCLASSIFY_END]], label [[FPCLASSIFY_NOT_INF]]
 // CHECK:       fpclassify_not_inf:
-// CHECK-NEXT:    [[ISNORMAL:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x10000000000000, metadata !"uge", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[ISNORMAL:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x10000000000000, metadata !"uge", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    [[TMP2]] = select i1 [[ISNORMAL]], i32 2, i32 3
 // CHECK-NEXT:    br label [[FPCLASSIFY_END]]
 //
@@ -60,7 +60,7 @@ void test_fpclassify(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 516) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -76,7 +76,7 @@ void test_fp16_isinf(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 516) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -92,7 +92,7 @@ void test_float_isinf(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 516) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.4, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -108,7 +108,7 @@ void test_double_isinf(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 504) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.5, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -124,7 +124,7 @@ void test_fp16_isfinite(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 504) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.6, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -140,7 +140,7 @@ void test_float_isfinite(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 504) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.7, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -156,8 +156,8 @@ void test_double_isfinite(double d) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR5]]
-// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR6]]
+// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    [[TMP2:%.*]] = bitcast double [[TMP0]] to i64
 // CHECK-NEXT:    [[TMP3:%.*]] = icmp slt i64 [[TMP2]], 0
 // CHECK-NEXT:    [[TMP4:%.*]] = select i1 [[TMP3]], i32 -1, i32 1
@@ -176,7 +176,7 @@ void test_isinf_sign(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 3) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.9, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -192,7 +192,7 @@ void test_fp16_isnan(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 3) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.10, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -208,7 +208,7 @@ void test_float_isnan(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 3) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.11, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -224,7 +224,7 @@ void test_double_isnan(double d) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 264) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 264) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.12, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
diff --git a/clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl b/clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl
index 1544770dfa4f30..42b53f8725e03a 100644
--- a/clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl
+++ b/clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl
@@ -144,7 +144,7 @@ kernel void device_side_enqueue(global float *a, global float *b, int i) {
 // STRICTFP-NEXT:    [[TMP1:%.*]] = load i32, ptr addrspace(4) [[BLOCK_CAPTURE_ADDR1]], align 4
 // STRICTFP-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr addrspace(1) [[TMP0]], i32 [[TMP1]]
 // STRICTFP-NEXT:    [[TMP2:%.*]] = load float, ptr addrspace(1) [[ARRAYIDX]], align 4
-// STRICTFP-NEXT:    [[TMP3:%.*]] = call float @llvm.experimental.constrained.fmuladd.f32(float 4.000000e+00, float [[TMP2]], float 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR5]]
+// STRICTFP-NEXT:    [[TMP3:%.*]] = call float @llvm.experimental.constrained.fmuladd.f32(float 4.000000e+00, float [[TMP2]], float 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.round"(i32 1), "fpe.except"(i32 2) ]
 // STRICTFP-NEXT:    [[BLOCK_CAPTURE_ADDR2:%.*]] = getelementptr inbounds nuw <{ i32, i32, ptr addrspace(4), ptr addrspace(1), i32, ptr addrspace(1) }>, ptr addrspace(4) [[DOTBLOCK_DESCRIPTOR]], i32 0, i32 3
 // STRICTFP-NEXT:    [[TMP4:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[BLOCK_CAPTURE_ADDR2]], align 4
 // STRICTFP-NEXT:    [[BLOCK_CAPTURE_ADDR3:%.*]] = getelementptr inbounds nuw <{ i32, i32, ptr addrspace(4), ptr addrspace(1), i32, ptr addrspace(1) }>, ptr addrspace(4) [[DOTBLOCK_DESCRIPTOR]], i32 0, i32 4
@@ -173,7 +173,7 @@ kernel void device_side_enqueue(global float *a, global float *b, int i) {
 // STRICTFP: attributes #[[ATTR2]] = { convergent noinline nounwind optnone strictfp "stack-protector-buffer-size"="8" }
 // STRICTFP: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind strictfp willreturn memory(inaccessiblemem: readwrite) }
 // STRICTFP: attributes #[[ATTR4]] = { convergent nounwind "stack-protector-buffer-size"="8" }
-// STRICTFP: attributes #[[ATTR5]] = { strictfp }
+// STRICTFP: attributes #[[ATTR5]] = { strictfp memory(inaccessiblemem: readwrite) }
 //.
 // SPIR32: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
 // SPIR32: [[META1:![0-9]+]] = !{i32 2, i32 0}
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 91c3e60bb0acb1..b1a546a2d4f808 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -2994,6 +2994,29 @@ A "convergencectrl" operand bundle is only valid on a ``convergent`` operation.
 When present, the operand bundle must contain exactly one value of token type.
 See the :doc:`ConvergentOperations` document for details.
 
+.. _ob_fpe:
+
+Floating-point Environment Operand Bundles
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These operand bundles provide details on how the operation interacts with the
+:ref:`floating-point environment <_floatenv>`. There are two kinds of such
+operand bundles, which characterize interaction with floating-point control
+modes and status bits.
+
+An operand bundle tagged with "fpe.round" may be associated with the operations
+that may depend on rounding mode. It has an integer value, which represents
+the rounding mode with the same encoding as ``llvm::RoundingMode`` uses. If it
+is present and is not equal to ``llvm::Dynamic``, it specifies the rounding
+mode, which will be used for the operation evaluation. The value
+``llvm::RoundingMode`` indicates that the rounding mode used by the operation is
+specified in a floating-point control register.
+
+An operand bundle tagged with "fpe.except" may be associated with the operations
+that may read or write floating-point exception flags. It has the same meaning
+and encoding as the corresponding argument in
+:ref:`constrained intrinsics <_constrainedfp>`.
+
 .. _moduleasm:
 
 Module-Level Inline Assembly
diff --git a/llvm/include/llvm/ADT/FloatingPointMode.h b/llvm/include/llvm/ADT/FloatingPointMode.h
index 639d931ef88fec..970cc89093924b 100644
--- a/llvm/include/llvm/ADT/FloatingPointMode.h
+++ b/llvm/include/llvm/ADT/FloatingPointMode.h
@@ -47,6 +47,15 @@ enum class RoundingMode : int8_t {
   Invalid = -1    ///< Denotes invalid value.
 };
 
+inline bool isValidRoundingMode(int X) {
+  return X >= 0 && X <= static_cast<int>(RoundingMode::Dynamic);
+}
+
+inline RoundingMode castToRoundingMode(int X) {
+  assert(isValidRoundingMode(X));
+  return static_cast<RoundingMode>(X);
+}
+
 /// Returns text representation of the given rounding mode.
 inline StringRef spell(RoundingMode RM) {
   switch (RM) {
diff --git a/llvm/include/llvm/IR/AutoUpgrade.h b/llvm/include/llvm/IR/AutoUpgrade.h
index 97c3e4d7589d7b..8bd005d73fba36 100644
--- a/llvm/include/llvm/IR/AutoUpgrade.h
+++ b/llvm/include/llvm/IR/AutoUpgrade.h
@@ -107,6 +107,8 @@ namespace llvm {
   /// Upgrade operand bundles (without knowing about their user instruction).
   void UpgradeOperandBundles(std::vector<OperandBundleDef> &OperandBundles);
 
+  CallBase *upgradeConstrainedFunctionCall(CallBase *CB);
+
 } // End llvm namespace
 
 #endif
diff --git a/llvm/include/llvm/IR/FPEnv.h b/llvm/include/llvm/IR/FPEnv.h
index a0197377759daf..e4602bab6038e0 100644
--- a/llvm/include/llvm/IR/FPEnv.h
+++ b/llvm/include/llvm/IR/FPEnv.h
@@ -43,6 +43,15 @@ enum ExceptionBehavior : uint8_t {
 
 }
 
+inline bool isValidExceptionBehavior(unsigned X) {
+  return X <= fp::ExceptionBehavior::ebStrict;
+}
+
+inline fp::ExceptionBehavior castToExceptionBehavior(unsigned X) {
+  assert(isValidExceptionBehavior(X));
+  return static_cast<fp::ExceptionBehavior>(X);
+}
+
 /// Returns a valid RoundingMode enumerator when given a string
 /// that is valid as input in constrained intrinsic rounding mode
 /// metadata.
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index 23fd8350a29b3d..ca732f4903ce44 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -357,6 +357,9 @@ class IRBuilderBase {
 
   void setConstrainedFPCallAttr(CallBase *I) {
     I->addFnAttr(Attribute::StrictFP);
+    MemoryEffects ME = MemoryEffects::inaccessibleMemOnly();
+    auto A = Attribute::getWithMemoryEffects(getContext(), ME);
+    I->addFnAttr(A);
   }
 
   void setDefaultOperandBundles(ArrayRef<OperandBundleDef> OpBundles) {
@@ -975,6 +978,16 @@ class IRBuilderBase {
                             Instruction *FMFSource = nullptr,
                             const Twine &Name = "");
 
+  /// Create a call to intrinsic \p ID with \p Args, mangled using \p Types and
+  /// with operand bundles.
+  /// If \p FMFSource is provided, copy fast-math-flags from that instruction to
+  /// the intrinsic.
+  CallInst *CreateIntrinsic(Intrinsic::ID ID, ArrayRef<Type *> Types,
+                            ArrayRef<Value *> Args,
+                            ArrayRef<OperandBundleDef> OpBundles,
+                            Instruction *FMFSource = nullptr,
+                            const Twine &Name = "");
+
   /// Create a call to intrinsic \p ID with \p RetTy and \p Args. If
   /// \p FMFSource is provided, copy fast-math-flags from that instruction to
   /// the intrinsic.
@@ -1311,6 +1324,15 @@ class IRBuilderBase {
     return I;
   }
 
+  RoundingMode
+  getEffectiveRounding(std::optional<RoundingMode> Rounding = std::nullopt) {
+    RoundingMode RM = DefaultConstrainedRounding;
+
+    if (Rounding)
+      RM = *Rounding;
+    return RM;
+  }
+
   Value *getConstrainedFPRounding(std::optional<RoundingMode> Rounding) {
     RoundingMode UseRounding = DefaultConstrainedRounding;
 
@@ -1325,6 +1347,14 @@ class IRBuilderBase {
     return MetadataAsValue::get(Context, RoundingMDS);
   }
 
+  fp::ExceptionBehavior getEffectiveExceptionBehavior(
+      std::optional<fp::ExceptionBehavior> Except = std::nullopt) {
+    fp::ExceptionBehavior EB = DefaultConstrainedExcept;
+    if (Except)
+      ...
[truncated]

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the arguments should be dropped from the intrinsics if they're being moved to bundles

@@ -4282,6 +4282,64 @@ static void upgradeDbgIntrinsicToDbgRecord(StringRef Name, CallBase *CI) {
CI->getParent()->insertDbgRecordBefore(DR, CI->getIterator());
}

static CallBase *upgradeConstrainedIntrinsicCall(CallBase *CB, Function *F,
IRBuilder<> &Builder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing auto upgrade tests

CallInst *NewCB = nullptr;
if (!NewBundles.empty()) {
SmallVector<Value *, 4> Args(CB->args());
SmallVector<OperandBundleDef, 2> Bundles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removing the old arguments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires change of the called function, as metadata arguments are mandatory. Eventually it should be implemented but in a subsequent patch.

Comment on lines 4312 to 4313
NewBundles.emplace_back("fpe.except",
ConstantInt::get(Type::getInt32Ty(C), *EB));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should omit the bundle if it is the default / strictest setting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. But if a function is used in strictfp environment, it must have a side effect for proper ordering. Keeping explicit bundles could assist this.

} else {
int RMValue = static_cast<int>(*RM);
NewBundles.emplace_back("fpe.round",
ConstantInt::get(Type::getInt32Ty(C), RMValue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to continue encoding these as the same metadata values, instead of the raw integer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in progress. Now metadata cannot be used in bundles.

Bundles.append(NewBundles);

Builder.SetInsertPoint(CB->getParent(), CB->getIterator());
MDNode *FPMath = CB->getMetadata(LLVMContext::MD_fpmath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no point in querying fpmath if you just use copyMetadata below.

I would expect you could also just hack up the call site in place without creating a fresh call instead.

Copy link
Contributor

@andykaylor andykaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think attaching these operand bundles to the existing set of intrinsics is what we want to do. I'd prefer to deprecate the existing intrinsics and define some new scheme that uses these bundles on intrinsics that may be used in other ways.

One of the original reasons that we talked about the idea of using operand bundles this way is that we wanted a way to indicate strict FP constraints on vector predication intrinsics without requiring an entire second set of intrinsics. I'd like to see a set of generic floating point intrinsics (possibly the existing non-constrained intrinsics) that could be modified with operand bundles in the way you've implemented here, but I guess we'd need some mechanism to ensure that the operand bundles were respected.

modes and status bits.

An operand bundle tagged with "fpe.round" may be associated with the operations
that may depend on rounding mode. It has an integer value, which represents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like having raw integers represent these values. It's entirely unreadable. Could you have the asm printer convert the value to a human-readbale token?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on using metadata for these bundles.

@@ -47,6 +47,15 @@ enum class RoundingMode : int8_t {
Invalid = -1 ///< Denotes invalid value.
};

inline bool isValidRoundingMode(int X) {
return X >= 0 && X <= static_cast<int>(RoundingMode::Dynamic);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a sufficient check. For instance, 5 and 6 are invalid but are less than RoundingMode::Dynamic. I don't know why RoundingMode::Dynamic is defined the way it is, but it's set to 7.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bad function name. Actually set of supported rounding modes depends on the used target. For example not every target supports rounding NearestTiesToAway. But any rounding mode must fit 3 bits. So the function is about loose validation.

If metadata will be used fo "fpe.round", this function would be unnecessary.

@@ -43,6 +43,15 @@ enum ExceptionBehavior : uint8_t {

}

inline bool isValidExceptionBehavior(unsigned X) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why the exception behaviors aren't handled in FloatingPointMode.h?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is historically, because the functionality was implemented by different people. We have "FPE.h" and "FloatingPointMode.h", both describing FP environment.

@@ -357,6 +357,9 @@ class IRBuilderBase {

void setConstrainedFPCallAttr(CallBase *I) {
I->addFnAttr(Attribute::StrictFP);
MemoryEffects ME = MemoryEffects::inaccessibleMemOnly();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of the constrained intrinsics currently use inaccessibleMemOnly, but it might be a good idea to get this attribute from the intrinsic itself. If we added an intrinsic for something like sincos, it would access argument memory too.

We should consider adding separate memory effects for fp control and status registers. Obviously that would be a separate change set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be necessary to touch the attributes. The set of intrinsic attributes are fixed (callsite attributes are another thing, but generally should be droppable here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the LangRef "Calls and invokes with operand bundles have unknown read / write effect on the heap on entry and exit (even if the call target specifies a memory attribute), unless they’re overridden with callsite specific attributes." I assumed that's why the callsite attribute is being set here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a constrained intrinsic like experimental_constrained_sqrt is used, its memory effect is taken from the intrinsic properties. If sqrt is used with bundle operands, the memory effect must be set explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to make CallBase::getMemoryEffects return the appropriate effects for the bundle operand, instead of needing to set it correctly.

(as a side note, since this is definitely follow-up patch territory, it probably makes sense to eventually add a FP environment mem effect separate from inaccessible mem only).

@@ -2994,6 +2994,29 @@ A "convergencectrl" operand bundle is only valid on a ``convergent`` operation.
When present, the operand bundle must contain exactly one value of token type.
See the :doc:`ConvergentOperations` document for details.

.. _ob_fpe:

Floating-point Environment Operand Bundles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we place restrictions on which intrinsics these operand bundles can be attached to? It seems that they would have no meaning for non-fp intrinsics, and it's unclear whether they should require the "strictfp" attribute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could introduce additional "memory" resources in addition to inaccessible memory to represent FP environment. In this case intrinsisc like sqrt would have effect like "reads control modes" and "sets status bits", which would characterize FP intrinsics. In the case of default FP environment these properties should be erased however.

Hopefully more fine-grained FP environment access could avoid unneeded ordering and increase performance of non-default execution modes.

@arsenm
Copy link
Contributor

arsenm commented Sep 25, 2024

Also it's silly that we need to do bitcode autoupgrade of "experimental" intrinsics, but x86 started shipping with strictfp enabled in production before they graduated. We might as well drop the experimental bit then

@kpneal
Copy link
Member

kpneal commented Sep 25, 2024

I don't think attaching these operand bundles to the existing set of intrinsics is what we want to do. I'd prefer to deprecate the existing intrinsics and define some new scheme that uses these bundles on intrinsics that may be used in other ways.

One of the original reasons that we talked about the idea of using operand bundles this way is that we wanted a way to indicate strict FP constraints on vector predication intrinsics without requiring an entire second set of intrinsics. I'd like to see a set of generic floating point intrinsics (possibly the existing non-constrained intrinsics) that could be modified with operand bundles in the way you've implemented here, but I guess we'd need some mechanism to ensure that the operand bundles were respected.

I thought bundles were a way to allow us to keep the FP-instruction-constrained intrinsics but dump all the other ones, and the bundles could be attached to any call that needed constrained semantics? I really, really hope we're keeping the safe-by-default semantics of the constrained intrinsics. I can't tell you how important the constrained semantics are to some compiler users like myself, and having to spend the next forever shooting down bugs because the constrained semantics aren't respected in some cases would be seriously damaging.

The IR Verifier changes aren't in place yet in part because there are I don't even remember how many hundreds of places that need to be corrected. A violation of constrained semantics wouldn't be caught by the Verifier (say, if something was hoisted) so we'd never be able to be confident that we got it right and kept it right.

If we can't keep the constrained semantics and near-100% guarantee that no new exceptions will be introduced then operand bundles are not a replacement for the constrained intrinsics.

@arsenm
Copy link
Contributor

arsenm commented Sep 25, 2024

If we can't keep the constrained semantics and near-100% guarantee that no new exceptions will be introduced then operand bundles are not a replacement for the constrained intrinsics.

We would still need a call / function attribute to indicate strictfp calls, and such calls would then be annotatable with bundles to relax the assumptions. The default would always have to be the most conservative assumption

@kpneal
Copy link
Member

kpneal commented Sep 25, 2024

If we can't keep the constrained semantics and near-100% guarantee that no new exceptions will be introduced then operand bundles are not a replacement for the constrained intrinsics.

We would still need a call / function attribute to indicate strictfp calls, and such calls would then be annotatable with bundles to relax the assumptions. The default would always have to be the most conservative assumption

With the constrained intrinsics the default is safe because optimizations don't recognize the constrained intrinsic and thus don't know how to optimize it. If we instead rely on the strictfp attribute then we'll need possibly thousands of checks for this attribute, we'll need everyone going forward to remember to check for it, and we'll have no way to verify that this rule is being followed.

Comment on lines 3002 to 3005
These operand bundles provide details on how the operation interacts with the
:ref:`floating-point environment <_floatenv>`. There are two kinds of such
operand bundles, which characterize interaction with floating-point control
modes and status bits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we also need the ability to represent that the instructions may depend on more than just the dynamic rounding mode for effect, for example denormal flushing bits, exception masking bits, or more exotic target-specific stuff (e.g., x87's precision control bit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more exotic target-specific stuff (e.g., x87's precision control bit)

For anyone who doesn't think FTZ complicates their life enough.....

We seem to be currently taking the position that FTZ/DAZ is outside the scope of the IR semantics, though I may have mentioned that I'm not entirely happy with that. An even stronger case could be made for pretending the x87 precision control doesn't exist. We have IR types for binary32, binary64, and x86_fp80. It wouldn't be entirely unreasonable to say that the behavior of the latter is unspecified if the precision control bit is changed. I'm not sure there's a realistic use case for dynamically changing the precision control, and clang already avoids using the x86_fp80 type on Windows, where the default setting isn't 80-bit. I guess someone could have legacy code that does it.

@andykaylor
Copy link
Contributor

With the constrained intrinsics the default is safe because optimizations don't recognize the constrained intrinsic and thus don't know how to optimize it. If we instead rely on the strictfp attribute then we'll need possibly thousands of checks for this attribute, we'll need everyone going forward to remember to check for it, and we'll have no way to verify that this rule is being followed.

I may have given a wrong impression here. I'm basically in agreement with you on this point. We definitely need the safe-by-default behavior. What I meant to say was that we don't necessarily need the current constrained intrinsics. We could replace the current constrained intrinsics with a new set of intrinsics that could be used for this and other purposes.

Right now, I can think of at least three types of math intrinsics needed:

  1. Correctly-rounded results
  2. Vector-predication
  3. Strict FP semantics

I think a lot of us would wish that the semantics of the libm-equivalent intrinsics required correctly rounded results, but as long as the common system math libraries aren't producing correctly-rounded results, that's a tough ask.

I'm a bit concerned that we've missed the boat on the vector-predicated intrinsics, because they are in widespread use already, so if we want to combine vector predication with strict FP semantics via operand bundles, that gets us right back into the problem of retrofitting correctness.

@spavloff
Copy link
Collaborator Author

I don't think attaching these operand bundles to the existing set of intrinsics is what we want to do. I'd prefer to deprecate the existing intrinsics and define some new scheme that uses these bundles on intrinsics that may be used in other ways.

One of the original reasons that we talked about the idea of using operand bundles this way is that we wanted a way to indicate strict FP constraints on vector predication intrinsics without requiring an entire second set of intrinsics. I'd like to see a set of generic floating point intrinsics (possibly the existing non-constrained intrinsics) that could be modified with operand bundles in the way you've implemented here, but I guess we'd need some mechanism to ensure that the operand bundles were respected.

Coexistence of old-style intrinsics and operand bundles is a way to make transition to operand bundles gradually. Constrained intrinsics have existed enough long, they have enough extensive support in the compiler and they are already used in production code. Changing the implementation requires vast code changes and making them in several steps could be less painful. I see the steps:

  • operand FP bundles are introduced, metadata arguments still exist,
  • intrinsic functions like experimental_constrained_floor are replaced by their regular counterparts (like floor), their constrained nature is represented by operand bundles only. Intrinsics like experimental_constrained_fadd still preserve their original form.
  • All remainde constrained functions use operand bundles. Transition is finished.

Eventually operand bundles should remain the only mechanism. It seems it does not make sense to have two mechanisms for the same task.

I thought bundles were a way to allow us to keep the FP-instruction-constrained intrinsics but dump all the other ones, and the bundles could be attached to any call that needed constrained semantics? I really, really hope we're keeping the safe-by-default semantics of the constrained intrinsics. I can't tell you how important the constrained semantics are to some compiler users like myself, and having to spend the next forever shooting down bugs because the constrained semantics aren't respected in some cases would be seriously damaging.

The IR Verifier changes aren't in place yet in part because there are I don't even remember how many hundreds of places that need to be corrected. A violation of constrained semantics wouldn't be caught by the Verifier (say, if something was hoisted) so we'd never be able to be confident that we got it right and kept it right.

The only mandatory property of any constrained intrinsic is its side effect. It should be enough to maintain constrained semantics. Rounding mode and exception behavior are now only hints. What kind of violation of constrained semantics you observed? Did they relate to wrong ordering? Hundred of violations sound threatening.

With the constrained intrinsics the default is safe because optimizations don't recognize the constrained intrinsic and thus don't know how to optimize it.

It does not look as a good base for safety. Many users want the code in non-default FP modes to be more efficient, Now any any deviation from the default mode requires use of constrained intrinsics in the entire function, but now such solution exhibits poor performance, unsuitable for practical needs. Eventually optimizations will involve the constrained intrinsics too.

It seems to me that we also need the ability to represent that the instructions may depend on more than just the dynamic rounding mode for effect, for example denormal flushing bits, exception masking bits, or more exotic target-specific stuff (e.g., x87's precision control bit).

It can be realized using additional bundles type, for example, "fpe.denorm". Or, if we change the bundle representation to metadata, the operand bundles could be like this:

    %1 = call double @llvm.func_01(double %0) [ "fpe.modes"(metadata !"rtz", metadata !"daz"), "fpe.except"(metadata !"strict") ]

@arsenm
Copy link
Contributor

arsenm commented Sep 30, 2024

With the constrained intrinsics the default is safe because optimizations don't recognize the constrained intrinsic and thus don't know how to optimize it. If we instead rely on the strictfp attribute then we'll need possibly thousands of checks for this attribute, we'll need everyone going forward to remember to check for it, and we'll have no way to verify that this rule is being followed.

The current state already requires you to check this for any library calls. Not sure any wide audit of those ever happened. I don't see a better alternative to cover those, plus the full set of target intrinsics.

@kpneal
Copy link
Member

kpneal commented Oct 4, 2024

It does not look as a good base for safety. Many users want the code in non-default FP modes to be more efficient, Now any any deviation from the default mode requires use of constrained intrinsics in the entire function, but now such solution exhibits poor performance, unsuitable for practical needs. Eventually optimizations will involve the constrained intrinsics too.

This sounds like a rejection of safe-by-default. Keep in mind that "unsuitable for practical needs" doesn't mean that everyone agrees with you about what level of performance is unsuitable for one's practical needs. And a compiler that generates good performing code with random FP exceptions inserted is, in my experience, unusable if one runs with traps on. Trading performance for exception safety is a valid choice for some. Trading away all of the performance lost because few optimization passes know how to deal with the constrained intrinsics is still a valid choice for some.

WRT eliminating the constrained intrinsics completely, I thought that operand bundles could only be attached to function calls and not regular instructions? If I'm wrong, we still have a problem because there are so many uses of the regular FP instructions that we can't be safe-by-default and still use those instructions. We'd need to keep some kind of the constrained intrinsics (or new intrinsics) that give us replacements for the regular FP instructions.

I do like the idea of finding ways to get the behavior we need without having to have most developers care about strictfp. Reality is that most LLVM developers don't have any interest in strictfp. If we're piggybacking off of other parts of the LLVM design that are already widely supported then that makes our jobs much easier. If we can rely on marking calls as "inaccessiblememonly" or other side effects then that sounds quite helpful.

The problem I've run into with the IR Verifier is that the strictfp mode of the IR Builder needs to be turned on in all cases needed, but currently doesn't because the IR Builder's constructor doesn't know to construct itself in strictfp mode. Now, many instantiations of the IR Builder occur with the constructor able to chase down the function definition and find out if strictfp mode is required. Many, hundreds, don't have that luxury and thus each has to be touched and corrected.

Having the IR Builder keep this strictfp mode and automatically add side-effect notes and/or operand bundles when required allows most developers to continue to not care about strictfp but preserve strictfp functionality at the same time. We'd still need a way to find out that some cases were missed because, for example, a strictfp attribute was forgotten. So the IR Builder instantiations still need to be corrected, but that's work in progress. Eventually the IR Builder could be changed to add bundles to, I don't know, all function calls that have a FP type passed to them perhaps?

No IR Builder changes are needed for this patch, of course, but I do want us to make sure we're keeping the whole design in mind as we go.

@spavloff
Copy link
Collaborator Author

spavloff commented Oct 7, 2024

This sounds like a rejection of safe-by-default.

What do you mean by "safe-by-default"? Implementation must be correct, and produce the code that executes in accordance to standards and documentation. Any deviation from them is an error and should be fixed. Does it mean that such implementation would be "safe"?

Keep in mind that "unsuitable for practical needs" doesn't mean that everyone agrees with you about what level of performance is unsuitable for one's practical needs. And a compiler that generates good performing code with random FP exceptions inserted is, in my experience, unusable if one runs with traps on.

Users have different needs and can tune compilation using appropriate options. Those who need strict exception behavior chose suitable option set and should get code that behave as expected but probably has lower performance. And there is a large number of users who need maximum performance but can sacrifice exception strictness. They should use another set of options. Does using different compilation options solves this problem?

Trading performance for exception safety is a valid choice for some. Trading away all of the performance lost because few optimization passes know how to deal with the constrained intrinsics is still a valid choice for some.

There is no hope to produce code as performant as in default mode, but with strict exception semantics. So the only way to make a compiler suitable for all users is to provide means to tune it.

WRT eliminating the constrained intrinsics completely, I thought that operand bundles could only be attached to function calls and not regular instructions? If I'm wrong, we still have a problem because there are so many uses of the regular FP instructions that we can't be safe-by-default and still use those instructions. We'd need to keep some kind of the constrained intrinsics (or new intrinsics) that give us replacements for the regular FP instructions.

There is no intention to remove constrained intrinsics (like experimental.constrained.fadd), because operand bundles could only be attached to function calls. The constrained intrinsics however need to change their shape in future to use operand bundles as well.

I do like the idea of finding ways to get the behavior we need without having to have most developers care about strictfp. Reality is that most LLVM developers don't have any interest in strictfp. If we're piggybacking off of other parts of the LLVM design that are already widely supported then that makes our jobs much easier. If we can rely on marking calls as "inaccessiblememonly" or other side effects then that sounds quite helpful.

The new representation does not differ in this aspect from the existing one. FP operations are represented by function calls, they have side effect and proper ordering must be provided by the existing LLVM mechanisms, a developer does not need to know about these properties. There is a hope that using side effect more fine-grained than "inaccessiblememonly" could make FP code faster for non-default modes, but it is in future.

The problem I've run into with the IR Verifier is that the strictfp mode of the IR Builder needs to be turned on in all cases needed, but currently doesn't because the IR Builder's constructor doesn't know to construct itself in strictfp mode. Now, many instantiations of the IR Builder occur with the constructor able to chase down the function definition and find out if strictfp mode is required. Many, hundreds, don't have that luxury and thus each has to be touched and corrected.

If IRBuilder does not set strictfp when it should, it usually ends up with assertion violation or verification errors. I guess in this case IR is created by some tool, probably MLIR-based, not by clang? Maybe this is a use case that we do not support enough well now and it would be helpful to investigate it.

@arsenm
Copy link
Contributor

arsenm commented Oct 7, 2024

WRT eliminating the constrained intrinsics completely, I thought that operand bundles could only be attached to function calls and not regular instructions? If I'm wrong, we still have a problem because there are so many uses of the regular FP instructions that we can't be safe-by-default and still use those instructions. We'd need to keep some kind of the constrained intrinsics (or new intrinsics) that give us replacements for the regular FP instructions.

Right, we would need to introduce new llvm.fadd etc. to carry bundles. If there are no bundles these could fold back to the regular instruction

@spavloff
Copy link
Collaborator Author

spavloff commented Dec 2, 2024

PR #118253 demonstrates changes required to upgrade an intrinsic (trunc in this case) to using FP operand bundles.

Copy link
Collaborator Author

spavloff commented Dec 2, 2024

Merge activity

  • Dec 2, 1:35 AM EST: The merge label 'FP Bundles' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Dec 2, 1:35 AM EST: A user added this pull request to the Graphite merge queue.
  • Dec 2, 1:36 AM EST: The Graphite merge queue couldn't merge this PR because it failed for an unknown reason (Stack merges are not currently supported for forked repositories. Please create a branch in the target repository in order to merge).

@graphite-app graphite-app bot removed the FP Bundles label Dec 2, 2024
return std::nullopt;
return convertStrToExceptionBehavior(cast<MDString>(MD)->getString());
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the plan was to have concurrent implementations until we can throw the switch. That way we won't have releases that have half-broken constrained intrinsic support but half-implemented operand bundles.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Now methods getRoundingMode and getExceptionBehavior are implemented in CallBase and use the information in bundles. Methods from ConstrainedFPIntrinsic are turned into functions getRoundingModeArg and getExceptionBehaviorArg, they are needed in autoupgrade.

Currently floating-point operations in general form (beyond the default
mode) are always represented by calls to constrained intrinsics. In
addition to the side effect, they carry additional information in the
form of metadata arguments. This scheme is not efficient in the case of
intrinsic function calls, as was noted in
https://discourse.llvm.org/t/thought-on-strictfp-support/71453, because
it requires defining a separate intrinsic for the same operation but
used in non-default FP environment. The solution proposed in the
discussion was "to move the complexity about the environment tracking
from the intrinsics themselves to the call instruction".

The way implemented in this change is to use operand bundles
(https://llvm.org/docs/LangRef.html#operand-bundles). This way was tried
previously (https://reviews.llvm.org/D93455), but was not finished.

This change does not add any new functionality, it only adds the new way
of keeping FP related information in LLVM IR. Metadata arguments of
constrained functions are preserved, but they are not used in the
queries like `getRoundingMode` or `getExceptionBehavior`.
- Fix Doxygen error,
- Fix clang-format error,
- remove unused function declaration,
- remove setting MD_fpmath, it is made by copyMetadata.
- Add method `hasFloatingPointBundles` to simplify checks,
- Update comments in `TailRecursionElimination.cpp`,
- Updated messages in Verifier,
- Add Verifier tests.
- Rebame remained cases of 'fpe.round' for 'fpe.control'
This change was discussed in
llvm#122735. As it was decided,
current usage of strictfp on call sites remains intact.
spavloff added a commit to spavloff/llvm-project that referenced this pull request Apr 14, 2025
This is a lite version of llvm#109798,
where code changes are minimized to facilitate discussion about the
implementation. The motivations and ideas behind the new floating-point
operation support are described in that PR and in the discussion
https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021.
There are concerns that the proposed changes are too invasive and a new
approach is required to make the transition smoother.

This implementation is essentially a subset of PR109798, where
everything beyond the minimum is removed. It tries to build eventually
the same implementation as that PR but in different steps.

The patch does not attempt to modify the existing implementation based
on the constrained intrinsics. Instead it introduces a new one using
operand bundles. This new implementation initially has very limited
functionality, which latter will be extended and finally can replace the
existing one.

This PR introduces the notion of floating-point operation, this is an
intrinsic, that is listed in the file "FloatingPointOps.def". These have
two additional properties:

1. In the strict environment (a function with strictfp attribute) calls
   to these operations acquire side effect, now it is read/write access
   to inaccessible memory, just as constrained intrinsics do.

2. Calls to these operations may have floating-point operand bundles.
   There are two kinds of such bundles, tagged with "fp.control" and
   "fp.except", which are used to carry additional information about
   control modes and exception handling. Initially the set of control
   modes consists of rounding mode only.

The set of operations enlisted in "FloatingPointOps.def" and in
"ConstrainedOps.def" are completely independent, an intrinsic may be in
one list or in both. The set of floating-point operations is expected to
grow and finally all FP intrinsics will be available in the new
implementation. In this patch set of intrinsics in
"FloatingPointOps.def" is minimum necessary for tests.
spavloff added a commit to spavloff/llvm-project that referenced this pull request Apr 14, 2025
This is a lite version of llvm#109798,
where code changes are minimized to facilitate discussion about the
implementation. The motivations and ideas behind the new floating-point
operation support are described in that PR and in the discussion
https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021.
There are concerns that the proposed changes are too invasive and a new
approach is required to make the transition smoother.

This implementation is essentially a subset of PR109798, where
everything beyond the minimum is removed. It tries to build eventually
the same implementation as that PR but in different steps.

The patch does not attempt to modify the existing implementation based
on the constrained intrinsics. Instead it introduces a new one using
operand bundles. This new implementation initially has very limited
functionality, which latter will be extended and finally can replace the
existing one.

This PR introduces the notion of floating-point operation, this is an
intrinsic, that is listed in the file "FloatingPointOps.def". These have
two additional properties:

1. In the strict environment (a function with strictfp attribute) calls
   to these operations acquire side effect, now it is read/write access
   to inaccessible memory, just as constrained intrinsics do.

2. Calls to these operations may have floating-point operand bundles.
   There are two kinds of such bundles, tagged with "fp.control" and
   "fp.except", which are used to carry additional information about
   control modes and exception handling. Initially the set of control
   modes consists of rounding mode only.

The set of operations enlisted in "FloatingPointOps.def" and in
"ConstrainedOps.def" are completely independent, an intrinsic may be in
one list or in both. The set of floating-point operations is expected to
grow and finally all FP intrinsics will be available in the new
implementation. In this patch set of intrinsics in
"FloatingPointOps.def" is minimum necessary for tests.
spavloff added a commit to spavloff/llvm-project that referenced this pull request Apr 15, 2025
This is a lite version of llvm#109798,
where code changes are minimized to facilitate discussion about the
implementation. The motivations and ideas behind the new floating-point
operation support are described in that PR and in the discussion
https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021.
There are concerns that the proposed changes are too invasive and a new
approach is required to make the transition smoother.

This implementation is essentially a subset of PR109798, where
everything beyond the minimum is removed. It tries to build eventually
the same implementation as that PR but in different steps.

The patch does not attempt to modify the existing implementation based
on the constrained intrinsics. Instead it introduces a new one using
operand bundles. This new implementation initially has very limited
functionality, which latter will be extended and finally can replace the
existing one.

This PR introduces the notion of floating-point operation, this is an
intrinsic, that is listed in the file "FloatingPointOps.def". These have
two additional properties:

1. In the strict environment (a function with strictfp attribute) calls
   to these operations acquire side effect, now it is read/write access
   to inaccessible memory, just as constrained intrinsics do.

2. Calls to these operations may have floating-point operand bundles.
   There are two kinds of such bundles, tagged with "fp.control" and
   "fp.except", which are used to carry additional information about
   control modes and exception handling. Initially the set of control
   modes consists of rounding mode only.

The set of operations enlisted in "FloatingPointOps.def" and in
"ConstrainedOps.def" are completely independent, an intrinsic may be in
one list or in both. The set of floating-point operations is expected to
grow and finally all FP intrinsics will be available in the new
implementation. In this patch set of intrinsics in
"FloatingPointOps.def" is minimum necessary for tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants