[AMDGPU] Consider FLAT instructions for VMEM hazard detection #137170

ro-i · 2025-04-24T12:41:07Z

In general, "Flat instructions look at the per-workitem address and determine for each work item if the target memory address is in global, private or scratch memory." (RDNA2 ISA) That means that FLAT instructions need to be considered for VMEM hazards even without "specific segment". It should not be needed for DMA VMEM/FLAT instructions, though.

See also #137148

Full diff: https://github.com/llvm/llvm-project/pull/137170.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+6-5)
(modified) llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir (+6-2)

diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index aaefe27b1324f..fcafd6a978a4b 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -1424,9 +1424,9 @@ static bool shouldRunLdsBranchVmemWARHazardFixup(const MachineFunction &MF,
   bool HasVmem = false;
   for (auto &MBB : MF) {
     for (auto &MI : MBB) {
-      HasLds |= SIInstrInfo::isDS(MI);
-      HasVmem |=
-          SIInstrInfo::isVMEM(MI) || SIInstrInfo::isSegmentSpecificFLAT(MI);
+      HasLds |= SIInstrInfo::isDS(MI) || SIInstrInfo::isLDSDMA(MI);
+      HasVmem |= (SIInstrInfo::isVMEM(MI) || SIInstrInfo::isFLAT(MI)) &&
+                 !SIInstrInfo::isLDSDMA(MI);
       if (HasLds && HasVmem)
         return true;
     }
@@ -1448,9 +1448,10 @@ bool GCNHazardRecognizer::fixLdsBranchVmemWARHazard(MachineInstr *MI) {
   assert(!ST.hasExtendedWaitCounts());
 
   auto IsHazardInst = [](const MachineInstr &MI) {
-    if (SIInstrInfo::isDS(MI))
+    if (SIInstrInfo::isDS(MI) || SIInstrInfo::isLDSDMA(MI))
       return 1;
-    if (SIInstrInfo::isVMEM(MI) || SIInstrInfo::isSegmentSpecificFLAT(MI))
+    if ((SIInstrInfo::isVMEM(MI) || SIInstrInfo::isFLAT(MI)) &&
+        !SIInstrInfo::isLDSDMA(MI))
       return 2;
     return 0;
   };
diff --git a/llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir b/llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir
index 86e657093b5b2..d3ee1c3c128b3 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir
+++ b/llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir
@@ -269,11 +269,15 @@ body:            |
     S_ENDPGM 0
 ...
 
-# GCN-LABEL: name: no_hazard_lds_branch_flat
+# FLAT_* instructions "look at the per-workitem address and determine for each
+# work item if the target memory address is in global, private or scratch
+# memory" (RDNA2 ISA)
+# GCN-LABEL: name: hazard_lds_branch_flat
 # GCN:      bb.1:
+# GFX10-NEXT: S_WAITCNT_VSCNT undef $sgpr_null, 0
 # GCN-NEXT: FLAT_LOAD_DWORD
 ---
-name:            no_hazard_lds_branch_flat
+name:            hazard_lds_branch_flat
 body:            |
   bb.0:
     successors: %bb.1

arsenm · 2025-04-24T13:05:22Z

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

+      HasLds |= SIInstrInfo::isDS(MI) || SIInstrInfo::isLDSDMA(MI);
+      HasVmem |= (SIInstrInfo::isVMEM(MI) || SIInstrInfo::isFLAT(MI)) &&
+                 !SIInstrInfo::isLDSDMA(MI);


Why do we have this pre-scan over the entire function in the first place? Why doesn't it just look at the current instruction like every other hazard recognizer? I also find this logic suspect, what happens with external calls or asm? Can we delete this whole thing?

The pre-scan had been added to reduce compile time: https://reviews.llvm.org/D104219

[...]
This patch significantly improves compilation time in the cases the hazard
cannot happen. In one pathological case I looked at IsHazardInst is needlesly
called 88.6 milion times.
[...]

But this isn't unique, there's a maximum lookahead to catch degenerate cases. Is this one not respecting it for some reason?

What's unique about this one is that the HazardFn passed into getWaitStatesSince contains a recursive call to getWaitStatesSince, which can make it ridiculously expensive.

It could probably be rewritten using the new hasHazard<StateT>approach to avoid the recursion.

ro-i requested review from jayfoad and arsenm April 24, 2025 12:41

llvmbot added the backend:AMDGPU label Apr 24, 2025

arsenm reviewed Apr 24, 2025

View reviewed changes

jayfoad requested a review from piotrAMD April 24, 2025 14:06

This was referenced Apr 24, 2025

[AMDGPU] Classify FLAT instructions as VMEM #137148

Open

[AMDGPU] IGLP: Fixes for VMEM load detection and unsigned int handling #135090

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Consider FLAT instructions for VMEM hazard detection #137170

[AMDGPU] Consider FLAT instructions for VMEM hazard detection #137170

ro-i commented Apr 24, 2025

llvmbot commented Apr 24, 2025

arsenm Apr 24, 2025

ro-i Apr 24, 2025

arsenm Apr 24, 2025

jayfoad Apr 24, 2025

jayfoad Apr 24, 2025

[AMDGPU] Consider FLAT instructions for VMEM hazard detection #137170

Are you sure you want to change the base?

[AMDGPU] Consider FLAT instructions for VMEM hazard detection #137170

Conversation

ro-i commented Apr 24, 2025

llvmbot commented Apr 24, 2025

arsenm Apr 24, 2025

Choose a reason for hiding this comment

ro-i Apr 24, 2025

Choose a reason for hiding this comment

arsenm Apr 24, 2025

Choose a reason for hiding this comment

jayfoad Apr 24, 2025

Choose a reason for hiding this comment

jayfoad Apr 24, 2025

Choose a reason for hiding this comment