[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

rj-jesus · 2025-04-15T09:58:47Z

If we are attempting to combine shuffle+bitcast but the bitcast is
pairable with a subsequent bitcast, we should not fold the shuffle as
doing so can block further simplifications.

The motivation for this is a long standing regression affecting SIMDe on
AArch64 introduced indirectly by the AlwaysInliner (1a2e77c). Some
reproducers:

…able. If we are attempting to combine shuffle+bitcast but the bitcast is pairable with a subsequent bitcast, we should not fold the shuffle as doing so can block further simplifications. The motivation for this is a long standing regression affecting SIMDe on AArch64 introduced indirectly by the alwaysinliner (1a2e77c). Examples of reproducers: * https://godbolt.org/z/53qx18s6M * https://godbolt.org/z/o5e43h5M7

llvmbot · 2025-04-15T09:59:24Z

@llvm/pr-subscribers-llvm-transforms

Author: Ricardo Jesus (rj-jesus)

Changes

If we are attempting to combine shuffle+bitcast but the bitcast is
pairable with a subsequent bitcast, we should not fold the shuffle as
doing so can block further simplifications.

The motivation for this is a long standing regression affecting SIMDe on
AArch64 introduced indirectly by the AlwaysInliner (1a2e77c). Some
reproducers:

Full diff: https://github.com/llvm/llvm-project/pull/135769.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp (+12-4)
(modified) llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll (+15)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index f897cc7855d2d..f6423cb40492e 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -3029,10 +3029,18 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
     SmallVector<BitCastInst *, 8> BCs;
     DenseMap<Type *, Value *> NewBCs;
     for (User *U : SVI.users())
-      if (BitCastInst *BC = dyn_cast<BitCastInst>(U))
-        if (!BC->use_empty())
-          // Only visit bitcasts that weren't previously handled.
-          BCs.push_back(BC);
+      if (BitCastInst *BC = dyn_cast<BitCastInst>(U)) {
+        // Only visit bitcasts that weren't previously handled.
+        if (BC->use_empty())
+          continue;
+        // Prefer to combine bitcasts of bitcasts before attempting this fold.
+        if (BC->hasOneUse()) {
+          auto *BC2 = dyn_cast<BitCastInst>(BC->user_back());
+          if (BC2 && isEliminableCastPair(BC, BC2))
+            continue;
+        }
+        BCs.push_back(BC);
+      }
     for (BitCastInst *BC : BCs) {
       unsigned BegIdx = Mask.front();
       Type *TgtTy = BC->getDestTy();
diff --git a/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll b/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
index f20077243273c..c6152368f06fd 100644
--- a/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
+++ b/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
@@ -235,3 +235,18 @@ define <3 x i4> @shuf_bitcast_wrong_size(<2 x i8> %v, i8 %x) {
   %r = shufflevector <4 x i4> %b, <4 x i4> undef, <3 x i32> <i32 0, i32 1, i32 2>
   ret <3 x i4> %r
 }
+
+; Negative test - chain of bitcasts.
+
+define <16 x i8> @shuf_bitcast_chain(<8 x i32> %v) {
+; CHECK-LABEL: @shuf_bitcast_chain(
+; CHECK-NEXT:    [[S:%.*]] = shufflevector <8 x i32> [[V:%.*]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:    [[C:%.*]] = bitcast <4 x i32> [[S]] to <16 x i8>
+; CHECK-NEXT:    ret <16 x i8> [[C]]
+;
+  %s = shufflevector <8 x i32> %v, <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+  %a = bitcast <4 x i32> %s to <2 x i64>
+  %b = bitcast <2 x i64> %a to i128
+  %c = bitcast i128 %b to <16 x i8>
+  ret <16 x i8> %c
+}

nikic

The alternative here would be to fold the

  %s.bc = bitcast <8 x i32> %v to <2 x i128>
  %s.extract = extractelement <2 x i128> %s.bc, i64 0
  %c = bitcast i128 %s.extract to <16 x i8>

pattern to shufflevector + bitcast instead. That would seem like the more robust solution. Do you think that would be viable, or does that run up against some other problem (I guess introducing shufflevectors without cost model may be problematic?)

rj-jesus · 2025-04-17T12:42:09Z

The alternative here would be to fold the
  %s.bc = bitcast <8 x i32> %v to <2 x i128>
  %s.extract = extractelement <2 x i128> %s.bc, i64 0
  %c = bitcast i128 %s.extract to <16 x i8>
pattern to shufflevector + bitcast instead. That would seem like the more robust solution. Do you think that would be viable, or does that run up against some other problem (I guess introducing shufflevectors without cost model may be problematic?)

Going that route seemed more complex, and I wasn't sure if it could have unintended consequences elsewhere (as you say, because of the shufflevectors). Would you like me to give it a try, though?

rj-jesus · 2025-04-22T08:56:55Z

Hi @nikic, shall I open a PR with the alternative fold (bitcast+extractelement+bitcast to shufflevector+bitcast) so that we can compare both approaches, or are you happy for us to block the initial fold (shufflevector+bitcast to bitcast+extractelement) from happening in the first place?

nikic · 2025-04-22T09:00:52Z

@rj-jesus I think it would be good to at least try the alternative and see whether it runs into any problems.

nikic

LGTM

rj-jesus added 2 commits April 14, 2025 07:58

Precommit test.

23c096c

rj-jesus requested a review from davemgreen April 15, 2025 09:58

rj-jesus requested a review from nikic as a code owner April 15, 2025 09:58

llvmbot added llvm:instcombine llvm:transforms labels Apr 15, 2025

This was referenced Apr 15, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

pre-commit: PR135769 dtcxzyw/llvm-opt-benchmark#2273

Closed

nikic reviewed Apr 17, 2025

View reviewed changes

rj-jesus mentioned this pull request Apr 23, 2025

[InstCombine] Fold bitcast (extelt (bitcast X), Idx) into bitcast+shufflevector. #136998

Open

rj-jesus added 2 commits April 29, 2025 08:03

Add new test (transformation off).

cbdcf1f

Update tests (transformation on).

06ecbae

nikic approved these changes Apr 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

rj-jesus commented Apr 15, 2025

llvmbot commented Apr 15, 2025

nikic left a comment

rj-jesus commented Apr 17, 2025

rj-jesus commented Apr 22, 2025 •

edited

Loading

nikic commented Apr 22, 2025

nikic left a comment

[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

Are you sure you want to change the base?

[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

Conversation

rj-jesus commented Apr 15, 2025

llvmbot commented Apr 15, 2025

nikic left a comment

Choose a reason for hiding this comment

rj-jesus commented Apr 17, 2025

rj-jesus commented Apr 22, 2025 • edited Loading

nikic commented Apr 22, 2025

nikic left a comment

Choose a reason for hiding this comment

rj-jesus commented Apr 22, 2025 •

edited

Loading