Skip to content

[AMDGPU] remove move instruction if there is no user of it #136735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BaoshanPang
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Baoshan (BaoshanPang)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/136735.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/v_swap_b32.mir (+1-8)
diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
index 73343e1c80f33..be07e88e87851 100644
--- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -797,7 +797,7 @@ MachineInstr *SIShrinkInstructions::matchSwap(MachineInstr &MovT) const {
     dropInstructionKeepingImpDefs(*MovY);
     MachineInstr *Next = &*std::next(MovT.getIterator());
 
-    if (T.isVirtual() && MRI->use_nodbg_empty(T)) {
+    if (MRI->use_nodbg_empty(T)) {
       dropInstructionKeepingImpDefs(MovT);
     } else {
       Xop.setIsKill(false);
diff --git a/llvm/test/CodeGen/AMDGPU/v_swap_b32.mir b/llvm/test/CodeGen/AMDGPU/v_swap_b32.mir
index 95aaea6ea8091..5cd395fb18074 100644
--- a/llvm/test/CodeGen/AMDGPU/v_swap_b32.mir
+++ b/llvm/test/CodeGen/AMDGPU/v_swap_b32.mir
@@ -1,11 +1,9 @@
 # RUN: llc -simplify-mir -mtriple=amdgcn -mcpu=gfx900 -run-pass=si-shrink-instructions -verify-machineinstrs %s -o - | FileCheck -check-prefix=GCN %s
-# RUN: llc -simplify-mir -mtriple=amdgcn -mcpu=gfx900 -passes=si-shrink-instructions -verify-machineinstrs %s -o - | FileCheck -check-prefix=GCN %s
 
 # GCN-LABEL: name: swap_phys_condensed
 # GCN: bb.0:
 # GCN-NEXT: liveins:
 # GCN-NEXT: {{^[ ]*$}}
-# GCN-NEXT: $vgpr2 = V_MOV_B32_e32 $vgpr0, implicit $exec
 # GCN-NEXT: $vgpr0, $vgpr1 = V_SWAP_B32 $vgpr1, $vgpr0, implicit $exec
 # GCN-NEXT: S_SETPC_B64_return
 ---
@@ -24,7 +22,6 @@ body:             |
 # GCN: bb.0:
 # GCN-NEXT: liveins:
 # GCN-NEXT: {{^[ ]*$}}
-# GCN-NEXT: $vgpr2 = V_MOV_B32_e32 $vgpr0, implicit $exec
 # GCN-NEXT: $vgpr3 = V_MOV_B32_e32 killed $vgpr4, implicit $exec
 # GCN-NEXT: $vgpr0, $vgpr1 = V_SWAP_B32 $vgpr1, $vgpr0, implicit $exec
 # GCN-NEXT: $vgpr5 = V_MOV_B32_e32 killed $vgpr6, implicit $exec
@@ -47,7 +44,6 @@ body:             |
 # GCN: bb.0:
 # GCN-NEXT: liveins:
 # GCN-NEXT: {{^[ ]*$}}
-# GCN-NEXT: $vgpr2 = V_MOV_B32_e32 $vgpr0, implicit $exec
 # GCN-NEXT: $vgpr0, $vgpr1 = V_SWAP_B32 $vgpr1, $vgpr0, implicit $exec
 # GCN-NEXT: S_SETPC_B64_return
 ---
@@ -66,7 +62,6 @@ body:             |
 # GCN: bb.0:
 # GCN-NEXT: liveins:
 # GCN-NEXT: {{^[ ]*$}}
-# GCN-NEXT: $vgpr2 = V_MOV_B32_e32 $vgpr0, implicit $exec
 # GCN-NEXT: $vgpr0, $vgpr1 = V_SWAP_B32 $vgpr1, $vgpr0, implicit $exec
 # GCN-NEXT: S_SETPC_B64_return
 ---
@@ -85,7 +80,6 @@ body:             |
 # GCN: bb.0:
 # GCN-NEXT: liveins:
 # GCN-NEXT: {{^[ ]*$}}
-# GCN-NEXT: $vgpr4_vgpr5 = COPY $vgpr0_vgpr1
 # GCN-NEXT: $vgpr0, $vgpr2 = V_SWAP_B32 $vgpr2, $vgpr0, implicit $exec
 # GCN-NEXT: $vgpr1, $vgpr3 = V_SWAP_B32 $vgpr3, $vgpr1, implicit $exec
 ---
@@ -936,8 +930,7 @@ body:             |
 ...
 
 # GCN-LABEL: implicit_ops_mov_t_swap_b32
-# GCN:      $vgpr3 = V_MOV_B32_e32 $vgpr0, implicit $exec, implicit $vgpr2, implicit killed $vgpr1_vgpr2, implicit-def $vgpr1
-# GCN-NEXT: $vgpr0, $vgpr1 = V_SWAP_B32 $vgpr1, $vgpr0, implicit $exec
+# GCN: $vgpr0, $vgpr1 = V_SWAP_B32 $vgpr1, $vgpr0, implicit $exec
 
 ---
 name:            implicit_ops_mov_t_swap_b32

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an end to end change where this makes an observable difference? That would be a useful testcase.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this should avoid touching a reserved physical register. I also have a long term desire to remove physical register use lists

@arsenm
Copy link
Contributor

arsenm commented Apr 22, 2025

Also should fix the description, it's not obvious what movt is

@BaoshanPang
Copy link
Contributor Author

BaoshanPang commented Apr 22, 2025

I don't know how to create a end to end test case yet . Is there an example I can check with?

This one would be difficult. You could try to craft some inline assembly where a swap could form in the middle

@BaoshanPang
Copy link
Contributor Author

I don't know how to create a end to end test case yet . Is there an example I can check with?

This one would be difficult. You could try to craft some inline assembly where a swap could form in the middle

For inline assembly, it seems this pass can't do any optimization to them. also I can't find a way that can generate such swap sequence from C code.

@arsenm
Copy link
Contributor

arsenm commented Apr 23, 2025

For inline assembly, it seems this pass can't do any optimization to them.

Yes, that's a plus. The point isn't to optimize the asm but the copies to satisfy the constraints. I tried something like

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 -o - < %s

define void @swap() {
  %v0 = call i32 asm "v_mov_b32 $0, 0", "={v0}"()
  %v1 = call i32 asm "v_mov_b32 $0, 1", "={v1}"()
  call void asm "; use $0","{v0},{v1}"(i32 %v1, i32 %v0)
  ret void
}

But we don't end up with a swap and use a temporary register:

swap:                                   ; @swap
; %bb.0:
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	;;#ASMSTART
	v_mov_b32 v0, 0
	;;#ASMEND
	;;#ASMSTART
	v_mov_b32 v1, 1
	;;#ASMEND
	s_nop 0
	v_mov_b32_e32 v2, v0
	v_mov_b32_e32 v0, v1
	v_mov_b32_e32 v1, v2
	;;#ASMSTART
	; use v0
	;;#ASMEND
	s_setpc_b64 s[30:31]

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the description to tag amdgpu and be more specific about what this is doing

For instruction sequence:
 move t, x
 move x, y
 mov  y, t

Enhance matchSwap so that 'move t, x' can be removed.
@BaoshanPang BaoshanPang changed the title remove 'movt' if there is no user of it [AMDGPU] remove move instruction if there is no user of it Apr 25, 2025
@BaoshanPang
Copy link
Contributor Author

Can you fix the description to tag amdgpu and be more specific about what this is doing

Changed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants