Open
Description
While debugging a reggression in julia we found that some code that used to vectorize no longer does. Digging deeper it looks like it's a reggression in instcombine from LLVM 18 to LLVM 19 (still there on trunk)
define swiftcc float @julia_perf_sum_1108(ptr nonnull swiftself %0, ptr noundef nonnull align 8 dereferenceable(16) %1) local_unnamed_addr #0 {
%8 = load i64, ptr %1, align 8, !tbaa !9, !alias.scope !12, !noalias !15
%9 = icmp slt i64 %8, 1
br i1 %9, label %26, label %10
10: ; preds = %2
%11 = getelementptr inbounds i8, ptr %1, i64 8
%12 = load ptr, ptr %11, align 8, !tbaa !20, !alias.scope !12, !noalias !15, !nonnull !8
%13 = getelementptr inbounds [1 x i32], ptr %12, i64 %8
br label %14
14: ; preds = %14, %10
%15 = phi i64 [ 0, %10 ], [ %17, %14 ]
%16 = phi float [ 0.000000e+00, %10 ], [ %24, %14 ]
%17 = add nuw nsw i64 %15, 1
%18 = getelementptr inbounds [1 x i32], ptr %12, i64 %15
%19 = getelementptr inbounds i8, ptr %13, i64 %15
%20 = load i8, ptr %19, align 1, !tbaa !22, !range !24, !alias.scope !12, !noalias !15
%21 = load float, ptr %18, align 4, !tbaa !25, !alias.scope !28, !noalias !29
%22 = icmp eq i8 %20, 0
%23 = fadd float %16, %21
%24 = select i1 %22, float %16, float %23
; %23 = select i1 %22, float -0.000000e+00, float %21
; %24 = fadd float %23, %16
%25 = icmp eq i64 %17, %8
br i1 %25, label %26, label %14, !llvm.loop !30
26: ; preds = %14, %2
%27 = phi float [ 0.000000e+00, %2 ], [ %24, %14 ]
ret float %27
}
The select highlighted used to be able to be folded into %24 = select i1 %23, float -0.000000e+00, float %22
which allowed a sequence of loop-simplify + LCSSA + vectorize to vectorize the loop. The sequence still works, but the instcombine change makes it not vectorize
https://godbolt.org/z/GE369hEcr (Just the instcombine)
https://godbolt.org/z/5YWo5WPY8 (Whole sequence)
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
gbaraldi commentedon Feb 24, 2025
This is a fallout from #113686, which had extensive discussions on the same topic. Not sure if there's anything actionable to do here. Apparently this was an illegal optimization without fastmath flags (nnan specifically)
I'm not sure why the fold seen here isn't legal though, if any of the involved values becomes NaN we will get the same results before the fold and after the fold
[-]Regression in instcombine[/-][+][InstCombine] Instcombine doesn't fold conditional addition where it used to[/+]gbaraldi commentedon Feb 24, 2025
Following @arsenm comment here I guess this is illegal, https://alive2.llvm.org/ce/z/Ssotj8. Since we may turn a sNaN into a qNaN
llvmbot commentedon Feb 24, 2025
@llvm/issue-subscribers-julialang
Author: Gabriel Baraldi (gbaraldi)
The select highlighted used to be able to be folded into
%24 = select i1 %23, float -0.000000e+00, float %22
which allowed a sequence of loop-simplify + LCSSA + vectorize to vectorize the loop. The sequence still works, but the instcombine change makes it not vectorizehttps://godbolt.org/z/GE369hEcr (Just the instcombine)
https://godbolt.org/z/5YWo5WPY8 (Whole sequence)
arsenm commentedon Feb 27, 2025
Correct, you may not introduce a canonicalize on a value that did not go through a possibly canonicalizing operation. You need to know the source value can't be snan or flushed, or you need to write the instruction so that the value would have gone through an FP operation. The raw select can only be dataflow
gbaraldi commentedon Feb 27, 2025
Is there a way to ask LLVM to allow this transformation always as a frontend? I guess adding a
llvm.canonicalize
to or adding -0.0 (which probably gets deleted too early for it to matter)?arsenm commentedon Feb 27, 2025
I would try to avoid inducing canonicalizes. You really don't want it in general purpose code. nnan on the select should work (or some other way to detect the value contextually can't be nan, like nofpclass or assume)
gbaraldi commentedon Feb 27, 2025
But won't nnan be UB if I get a NaN? Julia treats all NaNs as equals but NaN is still NaN
arsenm commentedon Feb 27, 2025
The select will be poison, not instant UB
gbaraldi commentedon Feb 27, 2025
But that would still be the wrong semantic I think (I might be wrong). Since NaN doesn't mean poison for us. I just don't care if LLVM replaces a SNaN with a QNaN, so ask it to have fp semantics instead of bit equal semantics (this is very confusing).
Also the select is made by an optimization, the original code has branches but is basically doing a conditional add in a for loop
arsenm commentedon Feb 27, 2025
You can spam other no-op FP operations, I suppose. https://alive2.llvm.org/ce/z/h2sXMx There are many ways you could rewrite this