-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769
Conversation
…able. If we are attempting to combine shuffle+bitcast but the bitcast is pairable with a subsequent bitcast, we should not fold the shuffle as doing so can block further simplifications. The motivation for this is a long standing regression affecting SIMDe on AArch64 introduced indirectly by the alwaysinliner (1a2e77c). Examples of reproducers: * https://godbolt.org/z/53qx18s6M * https://godbolt.org/z/o5e43h5M7
@llvm/pr-subscribers-llvm-transforms Author: Ricardo Jesus (rj-jesus) ChangesIf we are attempting to combine shuffle+bitcast but the bitcast is The motivation for this is a long standing regression affecting SIMDe on Full diff: https://github.com/llvm/llvm-project/pull/135769.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index f897cc7855d2d..f6423cb40492e 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -3029,10 +3029,18 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
SmallVector<BitCastInst *, 8> BCs;
DenseMap<Type *, Value *> NewBCs;
for (User *U : SVI.users())
- if (BitCastInst *BC = dyn_cast<BitCastInst>(U))
- if (!BC->use_empty())
- // Only visit bitcasts that weren't previously handled.
- BCs.push_back(BC);
+ if (BitCastInst *BC = dyn_cast<BitCastInst>(U)) {
+ // Only visit bitcasts that weren't previously handled.
+ if (BC->use_empty())
+ continue;
+ // Prefer to combine bitcasts of bitcasts before attempting this fold.
+ if (BC->hasOneUse()) {
+ auto *BC2 = dyn_cast<BitCastInst>(BC->user_back());
+ if (BC2 && isEliminableCastPair(BC, BC2))
+ continue;
+ }
+ BCs.push_back(BC);
+ }
for (BitCastInst *BC : BCs) {
unsigned BegIdx = Mask.front();
Type *TgtTy = BC->getDestTy();
diff --git a/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll b/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
index f20077243273c..c6152368f06fd 100644
--- a/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
+++ b/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
@@ -235,3 +235,18 @@ define <3 x i4> @shuf_bitcast_wrong_size(<2 x i8> %v, i8 %x) {
%r = shufflevector <4 x i4> %b, <4 x i4> undef, <3 x i32> <i32 0, i32 1, i32 2>
ret <3 x i4> %r
}
+
+; Negative test - chain of bitcasts.
+
+define <16 x i8> @shuf_bitcast_chain(<8 x i32> %v) {
+; CHECK-LABEL: @shuf_bitcast_chain(
+; CHECK-NEXT: [[S:%.*]] = shufflevector <8 x i32> [[V:%.*]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: [[C:%.*]] = bitcast <4 x i32> [[S]] to <16 x i8>
+; CHECK-NEXT: ret <16 x i8> [[C]]
+;
+ %s = shufflevector <8 x i32> %v, <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+ %a = bitcast <4 x i32> %s to <2 x i64>
+ %b = bitcast <2 x i64> %a to i128
+ %c = bitcast i128 %b to <16 x i8>
+ ret <16 x i8> %c
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative here would be to fold the
%s.bc = bitcast <8 x i32> %v to <2 x i128>
%s.extract = extractelement <2 x i128> %s.bc, i64 0
%c = bitcast i128 %s.extract to <16 x i8>
pattern to shufflevector + bitcast instead. That would seem like the more robust solution. Do you think that would be viable, or does that run up against some other problem (I guess introducing shufflevectors without cost model may be problematic?)
Going that route seemed more complex, and I wasn't sure if it could have unintended consequences elsewhere (as you say, because of the shufflevectors). Would you like me to give it a try, though? |
Hi @nikic, shall I open a PR with the alternative fold (bitcast+extractelement+bitcast to shufflevector+bitcast) so that we can compare both approaches, or are you happy for us to block the initial fold (shufflevector+bitcast to bitcast+extractelement) from happening in the first place? |
@rj-jesus I think it would be good to at least try the alternative and see whether it runs into any problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If we are attempting to combine shuffle+bitcast but the bitcast is
pairable with a subsequent bitcast, we should not fold the shuffle as
doing so can block further simplifications.
The motivation for this is a long standing regression affecting SIMDe on
AArch64 introduced indirectly by the AlwaysInliner (1a2e77c). Some
reproducers: