Skip to content

[InstCombine] Instcombine doesn't fold conditional addition where it used to #128557

Open
@gbaraldi

Description

@gbaraldi

While debugging a reggression in julia we found that some code that used to vectorize no longer does. Digging deeper it looks like it's a reggression in instcombine from LLVM 18 to LLVM 19 (still there on trunk)

define swiftcc float @julia_perf_sum_1108(ptr nonnull swiftself %0, ptr noundef nonnull align 8 dereferenceable(16) %1) local_unnamed_addr #0 {
  %8 = load i64, ptr %1, align 8, !tbaa !9, !alias.scope !12, !noalias !15
  %9 = icmp slt i64 %8, 1
  br i1 %9, label %26, label %10

10:                                               ; preds = %2
  %11 = getelementptr inbounds i8, ptr %1, i64 8
  %12 = load ptr, ptr %11, align 8, !tbaa !20, !alias.scope !12, !noalias !15, !nonnull !8
  %13 = getelementptr inbounds [1 x i32], ptr %12, i64 %8
  br label %14

14:                                               ; preds = %14, %10
  %15 = phi i64 [ 0, %10 ], [ %17, %14 ]
  %16 = phi float [ 0.000000e+00, %10 ], [ %24, %14 ]
  %17 = add nuw nsw i64 %15, 1
  %18 = getelementptr inbounds [1 x i32], ptr %12, i64 %15
  %19 = getelementptr inbounds i8, ptr %13, i64 %15
  %20 = load i8, ptr %19, align 1, !tbaa !22, !range !24, !alias.scope !12, !noalias !15
  %21 = load float, ptr %18, align 4, !tbaa !25, !alias.scope !28, !noalias !29
  %22 = icmp eq i8 %20, 0
  %23 = fadd float %16, %21
  %24 = select i1 %22, float %16, float %23
  ;  %23 = select i1 %22, float -0.000000e+00, float %21
  ;  %24 = fadd float %23, %16
  %25 = icmp eq i64 %17, %8
  br i1 %25, label %26, label %14, !llvm.loop !30

26:                                               ; preds = %14, %2
  %27 = phi float [ 0.000000e+00, %2 ], [ %24, %14 ]
  ret float %27
}

The select highlighted used to be able to be folded into %24 = select i1 %23, float -0.000000e+00, float %22 which allowed a sequence of loop-simplify + LCSSA + vectorize to vectorize the loop. The sequence still works, but the instcombine change makes it not vectorize

https://godbolt.org/z/GE369hEcr (Just the instcombine)

https://godbolt.org/z/5YWo5WPY8 (Whole sequence)

Activity

gbaraldi

gbaraldi commented on Feb 24, 2025

@gbaraldi
ContributorAuthor

This is a fallout from #113686, which had extensive discussions on the same topic. Not sure if there's anything actionable to do here. Apparently this was an illegal optimization without fastmath flags (nnan specifically)
I'm not sure why the fold seen here isn't legal though, if any of the involved values becomes NaN we will get the same results before the fold and after the fold

changed the title [-]Regression in instcombine[/-] [+][InstCombine] Instcombine doesn't fold conditional addition where it used to[/+] on Feb 24, 2025
gbaraldi

gbaraldi commented on Feb 24, 2025

@gbaraldi
ContributorAuthor

Following @arsenm comment here I guess this is illegal, https://alive2.llvm.org/ce/z/Ssotj8. Since we may turn a sNaN into a qNaN

llvmbot

llvmbot commented on Feb 24, 2025

@llvmbot
Member

@llvm/issue-subscribers-julialang

Author: Gabriel Baraldi (gbaraldi)

While debugging a reggression in julia we found that some code that used to vectorize no longer does. Digging deeper it looks like it's a reggression in instcombine from LLVM 18 to LLVM 19 (still there on trunk)
define swiftcc float @<!-- -->julia_perf_sum_1108(ptr nonnull swiftself %0, ptr noundef nonnull align 8 dereferenceable(16) %1) local_unnamed_addr #<!-- -->0 {
  %8 = load i64, ptr %1, align 8, !tbaa !9, !alias.scope !12, !noalias !15
  %9 = icmp slt i64 %8, 1
  br i1 %9, label %26, label %10

10:                                               ; preds = %2
  %11 = getelementptr inbounds i8, ptr %1, i64 8
  %12 = load ptr, ptr %11, align 8, !tbaa !20, !alias.scope !12, !noalias !15, !nonnull !8
  %13 = getelementptr inbounds [1 x i32], ptr %12, i64 %8
  br label %14

14:                                               ; preds = %14, %10
  %15 = phi i64 [ 0, %10 ], [ %17, %14 ]
  %16 = phi float [ 0.000000e+00, %10 ], [ %24, %14 ]
  %17 = add nuw nsw i64 %15, 1
  %18 = getelementptr inbounds [1 x i32], ptr %12, i64 %15
  %19 = getelementptr inbounds i8, ptr %13, i64 %15
  %20 = load i8, ptr %19, align 1, !tbaa !22, !range !24, !alias.scope !12, !noalias !15
  %21 = load float, ptr %18, align 4, !tbaa !25, !alias.scope !28, !noalias !29
  %22 = icmp eq i8 %20, 0
  %23 = fadd float %16, %21
  %24 = select i1 %22, float %16, float %23
  ;  %23 = select i1 %22, float -0.000000e+00, float %21
  ;  %24 = fadd float %23, %16
  %25 = icmp eq i64 %17, %8
  br i1 %25, label %26, label %14, !llvm.loop !30

26:                                               ; preds = %14, %2
  %27 = phi float [ 0.000000e+00, %2 ], [ %24, %14 ]
  ret float %27
}

The select highlighted used to be able to be folded into %24 = select i1 %23, float -0.000000e+00, float %22 which allowed a sequence of loop-simplify + LCSSA + vectorize to vectorize the loop. The sequence still works, but the instcombine change makes it not vectorize

https://godbolt.org/z/GE369hEcr (Just the instcombine)

https://godbolt.org/z/5YWo5WPY8 (Whole sequence)

arsenm

arsenm commented on Feb 27, 2025

@arsenm
Contributor

https://alive2.llvm.org/ce/z/Ssotj8

Correct, you may not introduce a canonicalize on a value that did not go through a possibly canonicalizing operation. You need to know the source value can't be snan or flushed, or you need to write the instruction so that the value would have gone through an FP operation. The raw select can only be dataflow

gbaraldi

gbaraldi commented on Feb 27, 2025

@gbaraldi
ContributorAuthor

Is there a way to ask LLVM to allow this transformation always as a frontend? I guess adding a llvm.canonicalize to or adding -0.0 (which probably gets deleted too early for it to matter)?

arsenm

arsenm commented on Feb 27, 2025

@arsenm
Contributor

I would try to avoid inducing canonicalizes. You really don't want it in general purpose code. nnan on the select should work (or some other way to detect the value contextually can't be nan, like nofpclass or assume)

gbaraldi

gbaraldi commented on Feb 27, 2025

@gbaraldi
ContributorAuthor

But won't nnan be UB if I get a NaN? Julia treats all NaNs as equals but NaN is still NaN

arsenm

arsenm commented on Feb 27, 2025

@arsenm
Contributor

But won't nnan be UB if I get a NaN? Julia treats all NaNs as equals but NaN is still NaN

The select will be poison, not instant UB

gbaraldi

gbaraldi commented on Feb 27, 2025

@gbaraldi
ContributorAuthor

But that would still be the wrong semantic I think (I might be wrong). Since NaN doesn't mean poison for us. I just don't care if LLVM replaces a SNaN with a QNaN, so ask it to have fp semantics instead of bit equal semantics (this is very confusing).
Also the select is made by an optimization, the original code has branches but is basically doing a conditional add in a for loop

define swiftcc float @julia_perf_sum_1063(ptr nonnull swiftself %pgcstack_arg, ptr addrspace(10) noundef nonnull align 8 dereferenceable(16) %"X::GenericMemory") #0 !dbg !4 {
top:
  %immutable_union = alloca i32, align 4
    #dbg_declare(ptr addrspace(10) %"X::GenericMemory", !18, !DIExpression(), !19)
  %current_task = getelementptr inbounds i8, ptr %pgcstack_arg, i32 -152
  %ptls_field = getelementptr inbounds i8, ptr %current_task, i32 168
  %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !20
  %0 = getelementptr inbounds i8, ptr %ptls_load, i32 16
  %safepoint = load ptr, ptr %0, align 8, !tbaa !24, !invariant.load !10
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(ptr %safepoint), !dbg !19
  fence syncscope("singlethread") seq_cst
  %1 = addrspacecast ptr addrspace(10) %"X::GenericMemory" to ptr addrspace(11), !dbg !26
  %.unbox = load i64, ptr addrspace(11) %1, align 8, !dbg !36, !tbaa !40, !alias.scope !43, !noalias !46
  %2 = icmp slt i64 0, %.unbox, !dbg !36
  %3 = xor i1 %2, true, !dbg !39
  br i1 %3, label %top.L24_crit_edge, label %L5, !dbg !39

top.L24_crit_edge:                                ; preds = %top
  br label %L24, !dbg !39

L5:                                               ; preds = %top
  br label %L6, !dbg !39

L6:                                               ; preds = %L20, %L5
  %value_phi = phi float [ 0.000000e+00, %L5 ], [ %value_phi4, %L20 ]
  %value_phi1 = phi i64 [ 0, %L5 ], [ %21, %L20 ]
  %.unbox2 = load i64, ptr addrspace(11) %1, align 8, !dbg !51, !tbaa !40, !alias.scope !43, !noalias !46
  %4 = icmp slt i64 %value_phi1, %.unbox2, !dbg !51
  %5 = xor i1 %4, true, !dbg !52
  br i1 %5, label %L6.L24_crit_edge, label %L10, !dbg !52

L6.L24_crit_edge:                                 ; preds = %L6
  br label %L24, !dbg !52

L10:                                              ; preds = %L6
  %6 = add i64 %value_phi1, 1, !dbg !53
  %memory_ref = insertvalue { i64, ptr addrspace(10) } zeroinitializer, ptr addrspace(10) %"X::GenericMemory", 1, !dbg !58
  %memoryref_data = extractvalue { i64, ptr addrspace(10) } %memory_ref, 0, !dbg !58
  %memoryref_offset = sub i64 %6, 1, !dbg !58
  %"memoryref_data+offset" = add i64 %memoryref_data, %memoryref_offset, !dbg !58
  %7 = insertvalue { i64, ptr addrspace(10) } zeroinitializer, i64 %"memoryref_data+offset", 0, !dbg !58
  %memory_ref3 = insertvalue { i64, ptr addrspace(10) } %7, ptr addrspace(10) %"X::GenericMemory", 1, !dbg !58
  %8 = addrspacecast ptr addrspace(10) %"X::GenericMemory" to ptr addrspace(11), !dbg !58
  %9 = getelementptr inbounds { i64, ptr }, ptr addrspace(11) %8, i32 0, i32 0, !dbg !58
  %memory_len = load i64, ptr addrspace(11) %9, align 8, !dbg !58, !tbaa !40, !range !60, !alias.scope !43, !noalias !46
  %10 = addrspacecast ptr addrspace(10) %"X::GenericMemory" to ptr addrspace(11), !dbg !58
  %memory_data_ptr = getelementptr inbounds { i64, ptr }, ptr addrspace(11) %10, i32 0, i32 1, !dbg !58
  %11 = load ptr, ptr addrspace(11) %memory_data_ptr, align 8, !dbg !58, !tbaa !61, !alias.scope !43, !noalias !46, !nonnull !10
  %memory_data = call ptr addrspace(13) @julia.gc_loaded(ptr addrspace(10) %"X::GenericMemory", ptr %11), !dbg !58
  %12 = getelementptr inbounds [1 x i32], ptr addrspace(13) %memory_data, i64 %memory_len, !dbg !58
  %13 = getelementptr inbounds [1 x i32], ptr addrspace(13) %memory_data, i64 %"memoryref_data+offset", !dbg !58
  %14 = getelementptr inbounds i8, ptr addrspace(13) %12, i64 %"memoryref_data+offset", !dbg !58
  %15 = load i8, ptr addrspace(13) %14, align 1, !dbg !58, !tbaa !63, !range !65, !alias.scope !43, !noalias !46
  %16 = add nuw i8 1, %15, !dbg !58
  call void @llvm.memcpy.p0.p13.i64(ptr align 4 %immutable_union, ptr addrspace(13) align 4 %13, i64 4, i1 false), !dbg !58, !tbaa !66, !alias.scope !69, !noalias !70
  %17 = and i8 %16, 127, !dbg !71
  %exactly_isa = icmp eq i8 %17, 1, !dbg !71
  %18 = xor i1 %exactly_isa, true, !dbg !71
  %19 = xor i1 %18, true, !dbg !71
  br i1 %19, label %L10.L20_crit_edge, label %L18, !dbg !71

L10.L20_crit_edge:                                ; preds = %L10
  br label %L20, !dbg !71

L18:                                              ; preds = %L10
  %immutable_union.unbox = load float, ptr %immutable_union, align 4, !dbg !74, !tbaa !66, !alias.scope !69, !noalias !70
  %20 = fadd float %value_phi, %immutable_union.unbox, !dbg !74
  br label %L20, !dbg !74

L20:                                              ; preds = %L10.L20_crit_edge, %L18
  %value_phi4 = phi float [ %20, %L18 ], [ %value_phi, %L10.L20_crit_edge ]
  %21 = add i64 %value_phi1, 1, !dbg !78
  br label %L6, !dbg !80, !llvm.loop !81

L24:                                              ; preds = %top.L24_crit_edge, %L6.L24_crit_edge
  %value_phi5 = phi float [ %value_phi, %L6.L24_crit_edge ], [ 0.000000e+00, %top.L24_crit_edge ]
  ret float %value_phi5, !dbg !82
}
arsenm

arsenm commented on Feb 27, 2025

@arsenm
Contributor

You can spam other no-op FP operations, I suppose. https://alive2.llvm.org/ce/z/h2sXMx There are many ways you could rewrite this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @arsenm@vchuravy@EugeneZelenko@gbaraldi@llvmbot

        Issue actions

          [InstCombine] Instcombine doesn't fold conditional addition where it used to · Issue #128557 · llvm/llvm-project