Implement operand bundles for floating-point operations #109798

spavloff · 2024-09-24T13:25:59Z

Currently floating-point operations in general form (beyond the default mode) are always represented by calls to constrained intrinsics. In addition to the side effect, they carry additional information in the form of metadata arguments. This scheme is not efficient in the case of intrinsic function calls, as was noted in
https://discourse.llvm.org/t/thought-on-strictfp-support/71453, because it requires defining a separate intrinsic for the same operation but used in non-default FP environment. The solution proposed in the discussion was "to move the complexity about the environment tracking from the intrinsics themselves to the call instruction".

The way implemented in this change is to use operand bundles (https://llvm.org/docs/LangRef.html#operand-bundles). This way was tried previously (https://reviews.llvm.org/D93455), but was not finished.

This change does not add any new functionality, it only adds the new way of keeping FP related information in LLVM IR. Metadata arguments of constrained functions are preserved, but they are not used in the queries like getRoundingMode or getExceptionBehavior.

llvmbot · 2024-09-24T13:26:35Z

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-adt

@llvm/pr-subscribers-backend-amdgpu

Author: Serge Pavlov (spavloff)

Changes

Currently floating-point operations in general form (beyond the default mode) are always represented by calls to constrained intrinsics. In addition to the side effect, they carry additional information in the form of metadata arguments. This scheme is not efficient in the case of intrinsic function calls, as was noted in
https://discourse.llvm.org/t/thought-on-strictfp-support/71453, because it requires defining a separate intrinsic for the same operation but used in non-default FP environment. The solution proposed in the discussion was "to move the complexity about the environment tracking from the intrinsics themselves to the call instruction".

The way implemented in this change is to use operand bundles (https://llvm.org/docs/LangRef.html#operand-bundles). This way was tried previously (https://reviews.llvm.org/D93455), but was not finished.

This change does not add any new functionality, it only adds the new way of keeping FP related information in LLVM IR. Metadata arguments of constrained functions are preserved, but they are not used in the queries like getRoundingMode or getExceptionBehavior.

Patch is 47.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109798.diff

22 Files Affected:

(modified) clang/test/CodeGen/X86/strictfp_builtins.c (+3-3)
(modified) clang/test/CodeGen/strictfp_builtins.c (+17-17)
(modified) clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl (+2-2)
(modified) llvm/docs/LangRef.rst (+23)
(modified) llvm/include/llvm/ADT/FloatingPointMode.h (+9)
(modified) llvm/include/llvm/IR/AutoUpgrade.h (+2)
(modified) llvm/include/llvm/IR/FPEnv.h (+9)
(modified) llvm/include/llvm/IR/IRBuilder.h (+48)
(modified) llvm/include/llvm/IR/InstrTypes.h (+7)
(modified) llvm/include/llvm/IR/IntrinsicInst.h (-2)
(modified) llvm/include/llvm/IR/LLVMContext.h (+2)
(modified) llvm/lib/AsmParser/LLParser.cpp (+46)
(modified) llvm/lib/IR/AutoUpgrade.cpp (+78)
(modified) llvm/lib/IR/IRBuilder.cpp (+44-12)
(modified) llvm/lib/IR/Instructions.cpp (+18)
(modified) llvm/lib/IR/IntrinsicInst.cpp (-23)
(modified) llvm/lib/IR/LLVMContext.cpp (+10)
(modified) llvm/lib/IR/Verifier.cpp (+69-1)
(modified) llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp (+6-4)
(modified) llvm/lib/Transforms/Utils/CloneFunction.cpp (+13-3)
(modified) llvm/test/Bitcode/operand-bundles-bc-analyzer.ll (+2)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pown.ll (+5-5)

diff --git a/clang/test/CodeGen/X86/strictfp_builtins.c b/clang/test/CodeGen/X86/strictfp_builtins.c
index 43e4060bef259b..75ed3a2555b3d7 100644
--- a/clang/test/CodeGen/X86/strictfp_builtins.c
+++ b/clang/test/CodeGen/X86/strictfp_builtins.c
@@ -27,7 +27,7 @@ void p(char *str, int x) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 516) #[[ATTR4:[0-9]+]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -43,7 +43,7 @@ void test_long_double_isinf(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 504) #[[ATTR4]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
@@ -59,7 +59,7 @@ void test_long_double_isfinite(long double ld) {
 // CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
 // CHECK-NEXT:    store x86_fp80 [[LD:%.*]], ptr [[LD_ADDR]], align 16
 // CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, ptr [[LD_ADDR]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3) #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f80(x86_fp80 [[TMP0]], i32 3) #[[ATTR4]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
diff --git a/clang/test/CodeGen/strictfp_builtins.c b/clang/test/CodeGen/strictfp_builtins.c
index 58815c7de4fa94..2e758115779711 100644
--- a/clang/test/CodeGen/strictfp_builtins.c
+++ b/clang/test/CodeGen/strictfp_builtins.c
@@ -31,21 +31,21 @@ void p(char *str, int x) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[ISZERO:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[ISZERO:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double 0.000000e+00, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR5:[0-9]+]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    br i1 [[ISZERO]], label [[FPCLASSIFY_END:%.*]], label [[FPCLASSIFY_NOT_ZERO:%.*]]
 // CHECK:       fpclassify_end:
 // CHECK-NEXT:    [[FPCLASSIFY_RESULT:%.*]] = phi i32 [ 4, [[ENTRY:%.*]] ], [ 0, [[FPCLASSIFY_NOT_ZERO]] ], [ 1, [[FPCLASSIFY_NOT_NAN:%.*]] ], [ [[TMP2:%.*]], [[FPCLASSIFY_NOT_INF:%.*]] ]
 // CHECK-NEXT:    call void @p(ptr noundef @.str.1, i32 noundef [[FPCLASSIFY_RESULT]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
 // CHECK:       fpclassify_not_zero:
-// CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP0]], metadata !"uno", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[CMP:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP0]], double [[TMP0]], metadata !"uno", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    br i1 [[CMP]], label [[FPCLASSIFY_END]], label [[FPCLASSIFY_NOT_NAN]]
 // CHECK:       fpclassify_not_nan:
-// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR5:[0-9]+]]
-// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR6:[0-9]+]]
+// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    br i1 [[ISINF]], label [[FPCLASSIFY_END]], label [[FPCLASSIFY_NOT_INF]]
 // CHECK:       fpclassify_not_inf:
-// CHECK-NEXT:    [[ISNORMAL:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x10000000000000, metadata !"uge", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[ISNORMAL:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x10000000000000, metadata !"uge", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    [[TMP2]] = select i1 [[ISNORMAL]], i32 2, i32 3
 // CHECK-NEXT:    br label [[FPCLASSIFY_END]]
 //
@@ -60,7 +60,7 @@ void test_fpclassify(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 516) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.2, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -76,7 +76,7 @@ void test_fp16_isinf(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 516) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.3, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -92,7 +92,7 @@ void test_float_isinf(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 516) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 516) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.4, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -108,7 +108,7 @@ void test_double_isinf(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 504) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.5, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -124,7 +124,7 @@ void test_fp16_isfinite(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 504) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.6, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -140,7 +140,7 @@ void test_float_isfinite(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 504) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 504) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.7, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -156,8 +156,8 @@ void test_double_isfinite(double d) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR5]]
-// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call double @llvm.fabs.f64(double [[TMP0]]) #[[ATTR6]]
+// CHECK-NEXT:    [[ISINF:%.*]] = call i1 @llvm.experimental.constrained.fcmp.f64(double [[TMP1]], double 0x7FF0000000000000, metadata !"oeq", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.except"(i32 2) ]
 // CHECK-NEXT:    [[TMP2:%.*]] = bitcast double [[TMP0]] to i64
 // CHECK-NEXT:    [[TMP3:%.*]] = icmp slt i64 [[TMP2]], 0
 // CHECK-NEXT:    [[TMP4:%.*]] = select i1 [[TMP3]], i32 -1, i32 1
@@ -176,7 +176,7 @@ void test_isinf_sign(double d) {
 // CHECK-NEXT:    [[H_ADDR:%.*]] = alloca half, align 2
 // CHECK-NEXT:    store half [[H:%.*]], ptr [[H_ADDR]], align 2
 // CHECK-NEXT:    [[TMP0:%.*]] = load half, ptr [[H_ADDR]], align 2
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f16(half [[TMP0]], i32 3) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.9, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -192,7 +192,7 @@ void test_fp16_isnan(_Float16 h) {
 // CHECK-NEXT:    [[F_ADDR:%.*]] = alloca float, align 4
 // CHECK-NEXT:    store float [[F:%.*]], ptr [[F_ADDR]], align 4
 // CHECK-NEXT:    [[TMP0:%.*]] = load float, ptr [[F_ADDR]], align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f32(float [[TMP0]], i32 3) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.10, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -208,7 +208,7 @@ void test_float_isnan(float f) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 3) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 3) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.11, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
@@ -224,7 +224,7 @@ void test_double_isnan(double d) {
 // CHECK-NEXT:    [[D_ADDR:%.*]] = alloca double, align 8
 // CHECK-NEXT:    store double [[D:%.*]], ptr [[D_ADDR]], align 8
 // CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[D_ADDR]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 264) #[[ATTR4]]
+// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.is.fpclass.f64(double [[TMP0]], i32 264) #[[ATTR5]]
 // CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
 // CHECK-NEXT:    call void @p(ptr noundef @.str.12, i32 noundef [[TMP2]]) #[[ATTR4]]
 // CHECK-NEXT:    ret void
diff --git a/clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl b/clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl
index 1544770dfa4f30..42b53f8725e03a 100644
--- a/clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl
+++ b/clang/test/CodeGenOpenCL/cl20-device-side-enqueue-attributes.cl
@@ -144,7 +144,7 @@ kernel void device_side_enqueue(global float *a, global float *b, int i) {
 // STRICTFP-NEXT:    [[TMP1:%.*]] = load i32, ptr addrspace(4) [[BLOCK_CAPTURE_ADDR1]], align 4
 // STRICTFP-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr addrspace(1) [[TMP0]], i32 [[TMP1]]
 // STRICTFP-NEXT:    [[TMP2:%.*]] = load float, ptr addrspace(1) [[ARRAYIDX]], align 4
-// STRICTFP-NEXT:    [[TMP3:%.*]] = call float @llvm.experimental.constrained.fmuladd.f32(float 4.000000e+00, float [[TMP2]], float 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR5]]
+// STRICTFP-NEXT:    [[TMP3:%.*]] = call float @llvm.experimental.constrained.fmuladd.f32(float 4.000000e+00, float [[TMP2]], float 1.000000e+00, metadata !"round.tonearest", metadata !"fpexcept.strict") #[[ATTR5]] [ "fpe.round"(i32 1), "fpe.except"(i32 2) ]
 // STRICTFP-NEXT:    [[BLOCK_CAPTURE_ADDR2:%.*]] = getelementptr inbounds nuw <{ i32, i32, ptr addrspace(4), ptr addrspace(1), i32, ptr addrspace(1) }>, ptr addrspace(4) [[DOTBLOCK_DESCRIPTOR]], i32 0, i32 3
 // STRICTFP-NEXT:    [[TMP4:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[BLOCK_CAPTURE_ADDR2]], align 4
 // STRICTFP-NEXT:    [[BLOCK_CAPTURE_ADDR3:%.*]] = getelementptr inbounds nuw <{ i32, i32, ptr addrspace(4), ptr addrspace(1), i32, ptr addrspace(1) }>, ptr addrspace(4) [[DOTBLOCK_DESCRIPTOR]], i32 0, i32 4
@@ -173,7 +173,7 @@ kernel void device_side_enqueue(global float *a, global float *b, int i) {
 // STRICTFP: attributes #[[ATTR2]] = { convergent noinline nounwind optnone strictfp "stack-protector-buffer-size"="8" }
 // STRICTFP: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind strictfp willreturn memory(inaccessiblemem: readwrite) }
 // STRICTFP: attributes #[[ATTR4]] = { convergent nounwind "stack-protector-buffer-size"="8" }
-// STRICTFP: attributes #[[ATTR5]] = { strictfp }
+// STRICTFP: attributes #[[ATTR5]] = { strictfp memory(inaccessiblemem: readwrite) }
 //.
 // SPIR32: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
 // SPIR32: [[META1:![0-9]+]] = !{i32 2, i32 0}
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 91c3e60bb0acb1..b1a546a2d4f808 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -2994,6 +2994,29 @@ A "convergencectrl" operand bundle is only valid on a ``convergent`` operation.
 When present, the operand bundle must contain exactly one value of token type.
 See the :doc:`ConvergentOperations` document for details.
 
+.. _ob_fpe:
+
+Floating-point Environment Operand Bundles
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These operand bundles provide details on how the operation interacts with the
+:ref:`floating-point environment <_floatenv>`. There are two kinds of such
+operand bundles, which characterize interaction with floating-point control
+modes and status bits.
+
+An operand bundle tagged with "fpe.round" may be associated with the operations
+that may depend on rounding mode. It has an integer value, which represents
+the rounding mode with the same encoding as ``llvm::RoundingMode`` uses. If it
+is present and is not equal to ``llvm::Dynamic``, it specifies the rounding
+mode, which will be used for the operation evaluation. The value
+``llvm::RoundingMode`` indicates that the rounding mode used by the operation is
+specified in a floating-point control register.
+
+An operand bundle tagged with "fpe.except" may be associated with the operations
+that may read or write floating-point exception flags. It has the same meaning
+and encoding as the corresponding argument in
+:ref:`constrained intrinsics <_constrainedfp>`.
+
 .. _moduleasm:
 
 Module-Level Inline Assembly
diff --git a/llvm/include/llvm/ADT/FloatingPointMode.h b/llvm/include/llvm/ADT/FloatingPointMode.h
index 639d931ef88fec..970cc89093924b 100644
--- a/llvm/include/llvm/ADT/FloatingPointMode.h
+++ b/llvm/include/llvm/ADT/FloatingPointMode.h
@@ -47,6 +47,15 @@ enum class RoundingMode : int8_t {
   Invalid = -1    ///< Denotes invalid value.
 };
 
+inline bool isValidRoundingMode(int X) {
+  return X >= 0 && X <= static_cast<int>(RoundingMode::Dynamic);
+}
+
+inline RoundingMode castToRoundingMode(int X) {
+  assert(isValidRoundingMode(X));
+  return static_cast<RoundingMode>(X);
+}
+
 /// Returns text representation of the given rounding mode.
 inline StringRef spell(RoundingMode RM) {
   switch (RM) {
diff --git a/llvm/include/llvm/IR/AutoUpgrade.h b/llvm/include/llvm/IR/AutoUpgrade.h
index 97c3e4d7589d7b..8bd005d73fba36 100644
--- a/llvm/include/llvm/IR/AutoUpgrade.h
+++ b/llvm/include/llvm/IR/AutoUpgrade.h
@@ -107,6 +107,8 @@ namespace llvm {
   /// Upgrade operand bundles (without knowing about their user instruction).
   void UpgradeOperandBundles(std::vector<OperandBundleDef> &OperandBundles);
 
+  CallBase *upgradeConstrainedFunctionCall(CallBase *CB);
+
 } // End llvm namespace
 
 #endif
diff --git a/llvm/include/llvm/IR/FPEnv.h b/llvm/include/llvm/IR/FPEnv.h
index a0197377759daf..e4602bab6038e0 100644
--- a/llvm/include/llvm/IR/FPEnv.h
+++ b/llvm/include/llvm/IR/FPEnv.h
@@ -43,6 +43,15 @@ enum ExceptionBehavior : uint8_t {
 
 }
 
+inline bool isValidExceptionBehavior(unsigned X) {
+  return X <= fp::ExceptionBehavior::ebStrict;
+}
+
+inline fp::ExceptionBehavior castToExceptionBehavior(unsigned X) {
+  assert(isValidExceptionBehavior(X));
+  return static_cast<fp::ExceptionBehavior>(X);
+}
+
 /// Returns a valid RoundingMode enumerator when given a string
 /// that is valid as input in constrained intrinsic rounding mode
 /// metadata.
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index 23fd8350a29b3d..ca732f4903ce44 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -357,6 +357,9 @@ class IRBuilderBase {
 
   void setConstrainedFPCallAttr(CallBase *I) {
     I->addFnAttr(Attribute::StrictFP);
+    MemoryEffects ME = MemoryEffects::inaccessibleMemOnly();
+    auto A = Attribute::getWithMemoryEffects(getContext(), ME);
+    I->addFnAttr(A);
   }
 
   void setDefaultOperandBundles(ArrayRef<OperandBundleDef> OpBundles) {
@@ -975,6 +978,16 @@ class IRBuilderBase {
                             Instruction *FMFSource = nullptr,
                             const Twine &Name = "");
 
+  /// Create a call to intrinsic \p ID with \p Args, mangled using \p Types and
+  /// with operand bundles.
+  /// If \p FMFSource is provided, copy fast-math-flags from that instruction to
+  /// the intrinsic.
+  CallInst *CreateIntrinsic(Intrinsic::ID ID, ArrayRef<Type *> Types,
+                            ArrayRef<Value *> Args,
+                            ArrayRef<OperandBundleDef> OpBundles,
+                            Instruction *FMFSource = nullptr,
+                            const Twine &Name = "");
+
   /// Create a call to intrinsic \p ID with \p RetTy and \p Args. If
   /// \p FMFSource is provided, copy fast-math-flags from that instruction to
   /// the intrinsic.
@@ -1311,6 +1324,15 @@ class IRBuilderBase {
     return I;
   }
 
+  RoundingMode
+  getEffectiveRounding(std::optional<RoundingMode> Rounding = std::nullopt) {
+    RoundingMode RM = DefaultConstrainedRounding;
+
+    if (Rounding)
+      RM = *Rounding;
+    return RM;
+  }
+
   Value *getConstrainedFPRounding(std::optional<RoundingMode> Rounding) {
     RoundingMode UseRounding = DefaultConstrainedRounding;
 
@@ -1325,6 +1347,14 @@ class IRBuilderBase {
     return MetadataAsValue::get(Context, RoundingMDS);
   }
 
+  fp::ExceptionBehavior getEffectiveExceptionBehavior(
+      std::optional<fp::ExceptionBehavior> Except = std::nullopt) {
+    fp::ExceptionBehavior EB = DefaultConstrainedExcept;
+    if (Except)
+      ...
[truncated]

arsenm

I think the arguments should be dropped from the intrinsics if they're being moved to bundles

arsenm · 2024-09-24T13:38:02Z

llvm/lib/IR/AutoUpgrade.cpp

@@ -4282,6 +4282,64 @@ static void upgradeDbgIntrinsicToDbgRecord(StringRef Name, CallBase *CI) {
  CI->getParent()->insertDbgRecordBefore(DR, CI->getIterator());
 }

+static CallBase *upgradeConstrainedIntrinsicCall(CallBase *CB, Function *F,
+                                                 IRBuilder<> &Builder) {


Missing auto upgrade tests

arsenm · 2024-09-24T13:39:19Z

llvm/lib/IR/AutoUpgrade.cpp

+  CallInst *NewCB = nullptr;
+  if (!NewBundles.empty()) {
+    SmallVector<Value *, 4> Args(CB->args());
+    SmallVector<OperandBundleDef, 2> Bundles;


Should be removing the old arguments?

It requires change of the called function, as metadata arguments are mandatory. Eventually it should be implemented but in a subsequent patch.

arsenm · 2024-09-24T13:39:48Z

llvm/lib/IR/AutoUpgrade.cpp

+      NewBundles.emplace_back("fpe.except",
+                              ConstantInt::get(Type::getInt32Ty(C), *EB));


Probably should omit the bundle if it is the default / strictest setting

Probably. But if a function is used in strictfp environment, it must have a side effect for proper ordering. Keeping explicit bundles could assist this.

arsenm · 2024-09-24T13:41:00Z

llvm/lib/IR/AutoUpgrade.cpp

+    } else {
+      int RMValue = static_cast<int>(*RM);
+      NewBundles.emplace_back("fpe.round",
+                              ConstantInt::get(Type::getInt32Ty(C), RMValue));


Is it possible to continue encoding these as the same metadata values, instead of the raw integer?

This is in progress. Now metadata cannot be used in bundles.

arsenm · 2024-09-24T13:41:36Z

llvm/lib/IR/AutoUpgrade.cpp

+    Bundles.append(NewBundles);
+
+    Builder.SetInsertPoint(CB->getParent(), CB->getIterator());
+    MDNode *FPMath = CB->getMetadata(LLVMContext::MD_fpmath);


There's no point in querying fpmath if you just use copyMetadata below.

I would expect you could also just hack up the call site in place without creating a fresh call instead.

andykaylor

I don't think attaching these operand bundles to the existing set of intrinsics is what we want to do. I'd prefer to deprecate the existing intrinsics and define some new scheme that uses these bundles on intrinsics that may be used in other ways.

One of the original reasons that we talked about the idea of using operand bundles this way is that we wanted a way to indicate strict FP constraints on vector predication intrinsics without requiring an entire second set of intrinsics. I'd like to see a set of generic floating point intrinsics (possibly the existing non-constrained intrinsics) that could be modified with operand bundles in the way you've implemented here, but I guess we'd need some mechanism to ensure that the operand bundles were respected.

andykaylor · 2024-09-24T22:43:03Z

llvm/docs/LangRef.rst

+modes and status bits.
+
+An operand bundle tagged with "fpe.round" may be associated with the operations
+that may depend on rounding mode. It has an integer value, which represents


I don't like having raw integers represent these values. It's entirely unreadable. Could you have the asm printer convert the value to a human-readbale token?

I am working on using metadata for these bundles.

andykaylor · 2024-09-24T22:57:54Z

llvm/include/llvm/ADT/FloatingPointMode.h

@@ -47,6 +47,15 @@ enum class RoundingMode : int8_t {
  Invalid = -1    ///< Denotes invalid value.
 };

+inline bool isValidRoundingMode(int X) {
+  return X >= 0 && X <= static_cast<int>(RoundingMode::Dynamic);


This isn't a sufficient check. For instance, 5 and 6 are invalid but are less than RoundingMode::Dynamic. I don't know why RoundingMode::Dynamic is defined the way it is, but it's set to 7.

This is bad function name. Actually set of supported rounding modes depends on the used target. For example not every target supports rounding NearestTiesToAway. But any rounding mode must fit 3 bits. So the function is about loose validation.

If metadata will be used fo "fpe.round", this function would be unnecessary.

andykaylor · 2024-09-24T23:00:11Z

llvm/include/llvm/IR/FPEnv.h

@@ -43,6 +43,15 @@ enum ExceptionBehavior : uint8_t {

 }

+inline bool isValidExceptionBehavior(unsigned X) {


Do you know why the exception behaviors aren't handled in FloatingPointMode.h?

I think this is historically, because the functionality was implemented by different people. We have "FPE.h" and "FloatingPointMode.h", both describing FP environment.

andykaylor · 2024-09-24T23:02:38Z

llvm/include/llvm/IR/IRBuilder.h

@@ -357,6 +357,9 @@ class IRBuilderBase {

  void setConstrainedFPCallAttr(CallBase *I) {
    I->addFnAttr(Attribute::StrictFP);
+    MemoryEffects ME = MemoryEffects::inaccessibleMemOnly();


I think all of the constrained intrinsics currently use inaccessibleMemOnly, but it might be a good idea to get this attribute from the intrinsic itself. If we added an intrinsic for something like sincos, it would access argument memory too.

We should consider adding separate memory effects for fp control and status registers. Obviously that would be a separate change set.

It shouldn't be necessary to touch the attributes. The set of intrinsic attributes are fixed (callsite attributes are another thing, but generally should be droppable here)

From the LangRef "Calls and invokes with operand bundles have unknown read / write effect on the heap on entry and exit (even if the call target specifies a memory attribute), unless they’re overridden with callsite specific attributes." I assumed that's why the callsite attribute is being set here.

If a constrained intrinsic like experimental_constrained_sqrt is used, its memory effect is taken from the intrinsic properties. If sqrt is used with bundle operands, the memory effect must be set explicitly.

It should be possible to make CallBase::getMemoryEffects return the appropriate effects for the bundle operand, instead of needing to set it correctly.

(as a side note, since this is definitely follow-up patch territory, it probably makes sense to eventually add a FP environment mem effect separate from inaccessible mem only).

andykaylor · 2024-09-24T23:19:56Z

llvm/docs/LangRef.rst

@@ -2994,6 +2994,29 @@ A "convergencectrl" operand bundle is only valid on a ``convergent`` operation.
 When present, the operand bundle must contain exactly one value of token type.
 See the :doc:`ConvergentOperations` document for details.

+.. _ob_fpe:
+
+Floating-point Environment Operand Bundles


Should we place restrictions on which intrinsics these operand bundles can be attached to? It seems that they would have no meaning for non-fp intrinsics, and it's unclear whether they should require the "strictfp" attribute.

I think we could introduce additional "memory" resources in addition to inaccessible memory to represent FP environment. In this case intrinsisc like sqrt would have effect like "reads control modes" and "sets status bits", which would characterize FP intrinsics. In the case of default FP environment these properties should be erased however.

Hopefully more fine-grained FP environment access could avoid unneeded ordering and increase performance of non-default execution modes.

arsenm · 2024-09-25T10:06:43Z

Also it's silly that we need to do bitcode autoupgrade of "experimental" intrinsics, but x86 started shipping with strictfp enabled in production before they graduated. We might as well drop the experimental bit then

kpneal · 2024-09-25T12:36:42Z

I don't think attaching these operand bundles to the existing set of intrinsics is what we want to do. I'd prefer to deprecate the existing intrinsics and define some new scheme that uses these bundles on intrinsics that may be used in other ways.

One of the original reasons that we talked about the idea of using operand bundles this way is that we wanted a way to indicate strict FP constraints on vector predication intrinsics without requiring an entire second set of intrinsics. I'd like to see a set of generic floating point intrinsics (possibly the existing non-constrained intrinsics) that could be modified with operand bundles in the way you've implemented here, but I guess we'd need some mechanism to ensure that the operand bundles were respected.

I thought bundles were a way to allow us to keep the FP-instruction-constrained intrinsics but dump all the other ones, and the bundles could be attached to any call that needed constrained semantics? I really, really hope we're keeping the safe-by-default semantics of the constrained intrinsics. I can't tell you how important the constrained semantics are to some compiler users like myself, and having to spend the next forever shooting down bugs because the constrained semantics aren't respected in some cases would be seriously damaging.

The IR Verifier changes aren't in place yet in part because there are I don't even remember how many hundreds of places that need to be corrected. A violation of constrained semantics wouldn't be caught by the Verifier (say, if something was hoisted) so we'd never be able to be confident that we got it right and kept it right.

If we can't keep the constrained semantics and near-100% guarantee that no new exceptions will be introduced then operand bundles are not a replacement for the constrained intrinsics.

arsenm · 2024-09-25T13:03:00Z

If we can't keep the constrained semantics and near-100% guarantee that no new exceptions will be introduced then operand bundles are not a replacement for the constrained intrinsics.

We would still need a call / function attribute to indicate strictfp calls, and such calls would then be annotatable with bundles to relax the assumptions. The default would always have to be the most conservative assumption

kpneal · 2024-09-25T13:11:57Z

If we can't keep the constrained semantics and near-100% guarantee that no new exceptions will be introduced then operand bundles are not a replacement for the constrained intrinsics.

We would still need a call / function attribute to indicate strictfp calls, and such calls would then be annotatable with bundles to relax the assumptions. The default would always have to be the most conservative assumption

With the constrained intrinsics the default is safe because optimizations don't recognize the constrained intrinsic and thus don't know how to optimize it. If we instead rely on the strictfp attribute then we'll need possibly thousands of checks for this attribute, we'll need everyone going forward to remember to check for it, and we'll have no way to verify that this rule is being followed.

jcranmer-intel · 2024-09-25T17:38:02Z

llvm/docs/LangRef.rst

+These operand bundles provide details on how the operation interacts with the
+:ref:`floating-point environment <_floatenv>`. There are two kinds of such
+operand bundles, which characterize interaction with floating-point control
+modes and status bits.


It seems to me that we also need the ability to represent that the instructions may depend on more than just the dynamic rounding mode for effect, for example denormal flushing bits, exception masking bits, or more exotic target-specific stuff (e.g., x87's precision control bit).

more exotic target-specific stuff (e.g., x87's precision control bit)

For anyone who doesn't think FTZ complicates their life enough.....

We seem to be currently taking the position that FTZ/DAZ is outside the scope of the IR semantics, though I may have mentioned that I'm not entirely happy with that. An even stronger case could be made for pretending the x87 precision control doesn't exist. We have IR types for binary32, binary64, and x86_fp80. It wouldn't be entirely unreasonable to say that the behavior of the latter is unspecified if the precision control bit is changed. I'm not sure there's a realistic use case for dynamically changing the precision control, and clang already avoids using the x86_fp80 type on Windows, where the default setting isn't 80-bit. I guess someone could have legacy code that does it.

andykaylor · 2024-09-26T22:43:59Z

With the constrained intrinsics the default is safe because optimizations don't recognize the constrained intrinsic and thus don't know how to optimize it. If we instead rely on the strictfp attribute then we'll need possibly thousands of checks for this attribute, we'll need everyone going forward to remember to check for it, and we'll have no way to verify that this rule is being followed.

I may have given a wrong impression here. I'm basically in agreement with you on this point. We definitely need the safe-by-default behavior. What I meant to say was that we don't necessarily need the current constrained intrinsics. We could replace the current constrained intrinsics with a new set of intrinsics that could be used for this and other purposes.

Right now, I can think of at least three types of math intrinsics needed:

Correctly-rounded results
Vector-predication
Strict FP semantics

I think a lot of us would wish that the semantics of the libm-equivalent intrinsics required correctly rounded results, but as long as the common system math libraries aren't producing correctly-rounded results, that's a tough ask.

I'm a bit concerned that we've missed the boat on the vector-predicated intrinsics, because they are in widespread use already, so if we want to combine vector predication with strict FP semantics via operand bundles, that gets us right back into the problem of retrofitting correctness.

spavloff · 2024-09-27T16:25:45Z

I don't think attaching these operand bundles to the existing set of intrinsics is what we want to do. I'd prefer to deprecate the existing intrinsics and define some new scheme that uses these bundles on intrinsics that may be used in other ways.

One of the original reasons that we talked about the idea of using operand bundles this way is that we wanted a way to indicate strict FP constraints on vector predication intrinsics without requiring an entire second set of intrinsics. I'd like to see a set of generic floating point intrinsics (possibly the existing non-constrained intrinsics) that could be modified with operand bundles in the way you've implemented here, but I guess we'd need some mechanism to ensure that the operand bundles were respected.

Coexistence of old-style intrinsics and operand bundles is a way to make transition to operand bundles gradually. Constrained intrinsics have existed enough long, they have enough extensive support in the compiler and they are already used in production code. Changing the implementation requires vast code changes and making them in several steps could be less painful. I see the steps:

operand FP bundles are introduced, metadata arguments still exist,
intrinsic functions like experimental_constrained_floor are replaced by their regular counterparts (like floor), their constrained nature is represented by operand bundles only. Intrinsics like experimental_constrained_fadd still preserve their original form.
All remainde constrained functions use operand bundles. Transition is finished.

Eventually operand bundles should remain the only mechanism. It seems it does not make sense to have two mechanisms for the same task.

I thought bundles were a way to allow us to keep the FP-instruction-constrained intrinsics but dump all the other ones, and the bundles could be attached to any call that needed constrained semantics? I really, really hope we're keeping the safe-by-default semantics of the constrained intrinsics. I can't tell you how important the constrained semantics are to some compiler users like myself, and having to spend the next forever shooting down bugs because the constrained semantics aren't respected in some cases would be seriously damaging.

The IR Verifier changes aren't in place yet in part because there are I don't even remember how many hundreds of places that need to be corrected. A violation of constrained semantics wouldn't be caught by the Verifier (say, if something was hoisted) so we'd never be able to be confident that we got it right and kept it right.

The only mandatory property of any constrained intrinsic is its side effect. It should be enough to maintain constrained semantics. Rounding mode and exception behavior are now only hints. What kind of violation of constrained semantics you observed? Did they relate to wrong ordering? Hundred of violations sound threatening.

With the constrained intrinsics the default is safe because optimizations don't recognize the constrained intrinsic and thus don't know how to optimize it.

It does not look as a good base for safety. Many users want the code in non-default FP modes to be more efficient, Now any any deviation from the default mode requires use of constrained intrinsics in the entire function, but now such solution exhibits poor performance, unsuitable for practical needs. Eventually optimizations will involve the constrained intrinsics too.

It seems to me that we also need the ability to represent that the instructions may depend on more than just the dynamic rounding mode for effect, for example denormal flushing bits, exception masking bits, or more exotic target-specific stuff (e.g., x87's precision control bit).

It can be realized using additional bundles type, for example, "fpe.denorm". Or, if we change the bundle representation to metadata, the operand bundles could be like this:

    %1 = call double @llvm.func_01(double %0) [ "fpe.modes"(metadata !"rtz", metadata !"daz"), "fpe.except"(metadata !"strict") ]

arsenm · 2024-09-30T13:51:02Z

With the constrained intrinsics the default is safe because optimizations don't recognize the constrained intrinsic and thus don't know how to optimize it. If we instead rely on the strictfp attribute then we'll need possibly thousands of checks for this attribute, we'll need everyone going forward to remember to check for it, and we'll have no way to verify that this rule is being followed.

The current state already requires you to check this for any library calls. Not sure any wide audit of those ever happened. I don't see a better alternative to cover those, plus the full set of target intrinsics.

kpneal · 2024-10-04T21:40:39Z

It does not look as a good base for safety. Many users want the code in non-default FP modes to be more efficient, Now any any deviation from the default mode requires use of constrained intrinsics in the entire function, but now such solution exhibits poor performance, unsuitable for practical needs. Eventually optimizations will involve the constrained intrinsics too.

This sounds like a rejection of safe-by-default. Keep in mind that "unsuitable for practical needs" doesn't mean that everyone agrees with you about what level of performance is unsuitable for one's practical needs. And a compiler that generates good performing code with random FP exceptions inserted is, in my experience, unusable if one runs with traps on. Trading performance for exception safety is a valid choice for some. Trading away all of the performance lost because few optimization passes know how to deal with the constrained intrinsics is still a valid choice for some.

WRT eliminating the constrained intrinsics completely, I thought that operand bundles could only be attached to function calls and not regular instructions? If I'm wrong, we still have a problem because there are so many uses of the regular FP instructions that we can't be safe-by-default and still use those instructions. We'd need to keep some kind of the constrained intrinsics (or new intrinsics) that give us replacements for the regular FP instructions.

I do like the idea of finding ways to get the behavior we need without having to have most developers care about strictfp. Reality is that most LLVM developers don't have any interest in strictfp. If we're piggybacking off of other parts of the LLVM design that are already widely supported then that makes our jobs much easier. If we can rely on marking calls as "inaccessiblememonly" or other side effects then that sounds quite helpful.

The problem I've run into with the IR Verifier is that the strictfp mode of the IR Builder needs to be turned on in all cases needed, but currently doesn't because the IR Builder's constructor doesn't know to construct itself in strictfp mode. Now, many instantiations of the IR Builder occur with the constructor able to chase down the function definition and find out if strictfp mode is required. Many, hundreds, don't have that luxury and thus each has to be touched and corrected.

Having the IR Builder keep this strictfp mode and automatically add side-effect notes and/or operand bundles when required allows most developers to continue to not care about strictfp but preserve strictfp functionality at the same time. We'd still need a way to find out that some cases were missed because, for example, a strictfp attribute was forgotten. So the IR Builder instantiations still need to be corrected, but that's work in progress. Eventually the IR Builder could be changed to add bundles to, I don't know, all function calls that have a FP type passed to them perhaps?

No IR Builder changes are needed for this patch, of course, but I do want us to make sure we're keeping the whole design in mind as we go.

spavloff · 2024-10-07T10:35:15Z

This sounds like a rejection of safe-by-default.

What do you mean by "safe-by-default"? Implementation must be correct, and produce the code that executes in accordance to standards and documentation. Any deviation from them is an error and should be fixed. Does it mean that such implementation would be "safe"?

Keep in mind that "unsuitable for practical needs" doesn't mean that everyone agrees with you about what level of performance is unsuitable for one's practical needs. And a compiler that generates good performing code with random FP exceptions inserted is, in my experience, unusable if one runs with traps on.

Users have different needs and can tune compilation using appropriate options. Those who need strict exception behavior chose suitable option set and should get code that behave as expected but probably has lower performance. And there is a large number of users who need maximum performance but can sacrifice exception strictness. They should use another set of options. Does using different compilation options solves this problem?

Trading performance for exception safety is a valid choice for some. Trading away all of the performance lost because few optimization passes know how to deal with the constrained intrinsics is still a valid choice for some.

There is no hope to produce code as performant as in default mode, but with strict exception semantics. So the only way to make a compiler suitable for all users is to provide means to tune it.

WRT eliminating the constrained intrinsics completely, I thought that operand bundles could only be attached to function calls and not regular instructions? If I'm wrong, we still have a problem because there are so many uses of the regular FP instructions that we can't be safe-by-default and still use those instructions. We'd need to keep some kind of the constrained intrinsics (or new intrinsics) that give us replacements for the regular FP instructions.

There is no intention to remove constrained intrinsics (like experimental.constrained.fadd), because operand bundles could only be attached to function calls. The constrained intrinsics however need to change their shape in future to use operand bundles as well.

I do like the idea of finding ways to get the behavior we need without having to have most developers care about strictfp. Reality is that most LLVM developers don't have any interest in strictfp. If we're piggybacking off of other parts of the LLVM design that are already widely supported then that makes our jobs much easier. If we can rely on marking calls as "inaccessiblememonly" or other side effects then that sounds quite helpful.

The new representation does not differ in this aspect from the existing one. FP operations are represented by function calls, they have side effect and proper ordering must be provided by the existing LLVM mechanisms, a developer does not need to know about these properties. There is a hope that using side effect more fine-grained than "inaccessiblememonly" could make FP code faster for non-default modes, but it is in future.

The problem I've run into with the IR Verifier is that the strictfp mode of the IR Builder needs to be turned on in all cases needed, but currently doesn't because the IR Builder's constructor doesn't know to construct itself in strictfp mode. Now, many instantiations of the IR Builder occur with the constructor able to chase down the function definition and find out if strictfp mode is required. Many, hundreds, don't have that luxury and thus each has to be touched and corrected.

If IRBuilder does not set strictfp when it should, it usually ends up with assertion violation or verification errors. I guess in this case IR is created by some tool, probably MLIR-based, not by clang? Maybe this is a use case that we do not support enough well now and it would be helpful to investigate it.

arsenm · 2024-10-07T17:20:43Z

WRT eliminating the constrained intrinsics completely, I thought that operand bundles could only be attached to function calls and not regular instructions? If I'm wrong, we still have a problem because there are so many uses of the regular FP instructions that we can't be safe-by-default and still use those instructions. We'd need to keep some kind of the constrained intrinsics (or new intrinsics) that give us replacements for the regular FP instructions.

Right, we would need to introduce new llvm.fadd etc. to carry bundles. If there are no bundles these could fold back to the regular instruction

spavloff · 2024-12-02T06:06:22Z

PR #118253 demonstrates changes required to upgrade an intrinsic (trunc in this case) to using FP operand bundles.

spavloff · 2024-12-02T06:35:28Z

Merge activity

Dec 2, 1:35 AM EST: The merge label 'FP Bundles' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
Dec 2, 1:35 AM EST: A user added this pull request to the Graphite merge queue.
Dec 2, 1:36 AM EST: The Graphite merge queue couldn't merge this PR because it failed for an unknown reason (Stack merges are not currently supported for forked repositories. Please create a branch in the target repository in order to merge).

kpneal · 2024-12-05T15:25:18Z

llvm/lib/IR/IntrinsicInst.cpp

-    return std::nullopt;
-  return convertStrToExceptionBehavior(cast<MDString>(MD)->getString());
-}
-


I thought the plan was to have concurrent implementations until we can throw the switch. That way we won't have releases that have half-broken constrained intrinsic support but half-implemented operand bundles.

You are right. Now methods getRoundingMode and getExceptionBehavior are implemented in CallBase and use the information in bundles. Methods from ConstrainedFPIntrinsic are turned into functions getRoundingModeArg and getExceptionBehaviorArg, they are needed in autoupgrade.

Currently floating-point operations in general form (beyond the default mode) are always represented by calls to constrained intrinsics. In addition to the side effect, they carry additional information in the form of metadata arguments. This scheme is not efficient in the case of intrinsic function calls, as was noted in https://discourse.llvm.org/t/thought-on-strictfp-support/71453, because it requires defining a separate intrinsic for the same operation but used in non-default FP environment. The solution proposed in the discussion was "to move the complexity about the environment tracking from the intrinsics themselves to the call instruction". The way implemented in this change is to use operand bundles (https://llvm.org/docs/LangRef.html#operand-bundles). This way was tried previously (https://reviews.llvm.org/D93455), but was not finished. This change does not add any new functionality, it only adds the new way of keeping FP related information in LLVM IR. Metadata arguments of constrained functions are preserved, but they are not used in the queries like `getRoundingMode` or `getExceptionBehavior`.

- Fix Doxygen error, - Fix clang-format error, - remove unused function declaration, - remove setting MD_fpmath, it is made by copyMetadata.

- Add method `hasFloatingPointBundles` to simplify checks, - Update comments in `TailRecursionElimination.cpp`, - Updated messages in Verifier, - Add Verifier tests. - Rebame remained cases of 'fpe.round' for 'fpe.control'

This change was discussed in llvm#122735. As it was decided, current usage of strictfp on call sites remains intact.

This is a lite version of llvm#109798, where code changes are minimized to facilitate discussion about the implementation. The motivations and ideas behind the new floating-point operation support are described in that PR and in the discussion https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021. There are concerns that the proposed changes are too invasive and a new approach is required to make the transition smoother. This implementation is essentially a subset of PR109798, where everything beyond the minimum is removed. It tries to build eventually the same implementation as that PR but in different steps. The patch does not attempt to modify the existing implementation based on the constrained intrinsics. Instead it introduces a new one using operand bundles. This new implementation initially has very limited functionality, which latter will be extended and finally can replace the existing one. This PR introduces the notion of floating-point operation, this is an intrinsic, that is listed in the file "FloatingPointOps.def". These have two additional properties: 1. In the strict environment (a function with strictfp attribute) calls to these operations acquire side effect, now it is read/write access to inaccessible memory, just as constrained intrinsics do. 2. Calls to these operations may have floating-point operand bundles. There are two kinds of such bundles, tagged with "fp.control" and "fp.except", which are used to carry additional information about control modes and exception handling. Initially the set of control modes consists of rounding mode only. The set of operations enlisted in "FloatingPointOps.def" and in "ConstrainedOps.def" are completely independent, an intrinsic may be in one list or in both. The set of floating-point operations is expected to grow and finally all FP intrinsics will be available in the new implementation. In this patch set of intrinsics in "FloatingPointOps.def" is minimum necessary for tests.

spavloff requested review from arsenm, jyknight, uweigand, SLTozer, topperc, paulwalker-arm, andykaylor, kpneal, jcranmer-intel and efriedma-quic September 24, 2024 13:25

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU backend:X86 llvm:ir llvm:transforms llvm:adt labels Sep 24, 2024

arsenm reviewed Sep 24, 2024

View reviewed changes

andykaylor reviewed Sep 24, 2024

View reviewed changes

jcranmer-intel reviewed Sep 25, 2024

View reviewed changes

spavloff added the FP Bundles label Dec 2, 2024

graphite-app bot removed the FP Bundles label Dec 2, 2024

kpneal reviewed Dec 5, 2024

View reviewed changes

spavloff force-pushed the bundles branch from 21d2166 to 273c0da Compare December 19, 2024 19:11

wzssyqa requested review from arsenm, andykaylor, kpneal and jcranmer-intel December 30, 2024 08:13

spavloff force-pushed the bundles branch from 273c0da to 535dd7b Compare January 9, 2025 15:17

llvmbot added the backend:SystemZ label Jan 9, 2025

spavloff mentioned this pull request Jan 13, 2025

[IR] Don't set strictfp on irrelevant calls #122735

Open

spavloff force-pushed the bundles branch from 535dd7b to 7550d41 Compare February 17, 2025 11:35

spavloff added 11 commits February 17, 2025 18:36

Use metadata for bundle values

359a022

Update tests in Transforms/EarlyCSE

c376e8d

Small changes: docs, clang-format, revievers' notes

4989e7b

- Fix Doxygen error, - Fix clang-format error, - remove unused function declaration, - remove setting MD_fpmath, it is made by copyMetadata.

Small fixes

4f219ea

- Add method `hasFloatingPointBundles` to simplify checks, - Update comments in `TailRecursionElimination.cpp`, - Updated messages in Verifier, - Add Verifier tests. - Rebame remained cases of 'fpe.round' for 'fpe.control'

Calculate MemoryEffect

cbce2a1

Don't set strictfp on irrelevant calls

8835d69

Revert "Don't set strictfp on irrelevant calls"

ea5b37e

This change was discussed in llvm#122735. As it was decided, current usage of strictfp on call sites remains intact.

Change documentation

36dc54b

Do not use strictfp call attribute as indicator of FPE awareness

c41809e

Rename fpe.* to fp.*

ac4271a

spavloff force-pushed the bundles branch from 7550d41 to ac4271a Compare February 17, 2025 11:40

spavloff mentioned this pull request Apr 14, 2025

Minimal support of floating-point operand bundles #135658

Open

		NewBundles.emplace_back("fpe.except",
		ConstantInt::get(Type::getInt32Ty(C), *EB));

		@@ -43,6 +43,15 @@ enum ExceptionBehavior : uint8_t {

		}

		inline bool isValidExceptionBehavior(unsigned X) {

Implement operand bundles for floating-point operations #109798

Are you sure you want to change the base?

Implement operand bundles for floating-point operations #109798

Conversation

spavloff commented Sep 24, 2024

llvmbot commented Sep 24, 2024 • edited Loading

arsenm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andykaylor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arsenm commented Sep 25, 2024

kpneal commented Sep 25, 2024

arsenm commented Sep 25, 2024

kpneal commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andykaylor commented Sep 26, 2024

spavloff commented Sep 27, 2024

arsenm commented Sep 30, 2024

kpneal commented Oct 4, 2024

spavloff commented Oct 7, 2024

arsenm commented Oct 7, 2024

spavloff commented Dec 2, 2024

spavloff commented Dec 2, 2024 • edited by graphite-app bot Loading

Merge activity

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvmbot commented Sep 24, 2024 •

edited

Loading

spavloff commented Dec 2, 2024 •

edited by graphite-app bot

Loading