-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[AMDGPU] Consider FLAT instructions for VMEM hazard detection #137170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In general, "Flat instructions look at the per-workitem address and determine for each work item if the target memory address is in global, private or scratch memory." (RDNA2 ISA) That means that FLAT instructions need to be considered for VMEM hazards even without "specific segment". It should not be needed for DMA VMEM/FLAT instructions, though.
@llvm/pr-subscribers-backend-amdgpu Author: Robert Imschweiler (ro-i) ChangesIn general, "Flat instructions look at the per-workitem address and determine for each work item if the target memory address is in global, private or scratch memory." (RDNA2 ISA) That means that FLAT instructions need to be considered for VMEM hazards even without "specific segment". It should not be needed for DMA VMEM/FLAT instructions, though. See also #137148 Full diff: https://github.com/llvm/llvm-project/pull/137170.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index aaefe27b1324f..fcafd6a978a4b 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -1424,9 +1424,9 @@ static bool shouldRunLdsBranchVmemWARHazardFixup(const MachineFunction &MF,
bool HasVmem = false;
for (auto &MBB : MF) {
for (auto &MI : MBB) {
- HasLds |= SIInstrInfo::isDS(MI);
- HasVmem |=
- SIInstrInfo::isVMEM(MI) || SIInstrInfo::isSegmentSpecificFLAT(MI);
+ HasLds |= SIInstrInfo::isDS(MI) || SIInstrInfo::isLDSDMA(MI);
+ HasVmem |= (SIInstrInfo::isVMEM(MI) || SIInstrInfo::isFLAT(MI)) &&
+ !SIInstrInfo::isLDSDMA(MI);
if (HasLds && HasVmem)
return true;
}
@@ -1448,9 +1448,10 @@ bool GCNHazardRecognizer::fixLdsBranchVmemWARHazard(MachineInstr *MI) {
assert(!ST.hasExtendedWaitCounts());
auto IsHazardInst = [](const MachineInstr &MI) {
- if (SIInstrInfo::isDS(MI))
+ if (SIInstrInfo::isDS(MI) || SIInstrInfo::isLDSDMA(MI))
return 1;
- if (SIInstrInfo::isVMEM(MI) || SIInstrInfo::isSegmentSpecificFLAT(MI))
+ if ((SIInstrInfo::isVMEM(MI) || SIInstrInfo::isFLAT(MI)) &&
+ !SIInstrInfo::isLDSDMA(MI))
return 2;
return 0;
};
diff --git a/llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir b/llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir
index 86e657093b5b2..d3ee1c3c128b3 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir
+++ b/llvm/test/CodeGen/AMDGPU/lds-branch-vmem-hazard.mir
@@ -269,11 +269,15 @@ body: |
S_ENDPGM 0
...
-# GCN-LABEL: name: no_hazard_lds_branch_flat
+# FLAT_* instructions "look at the per-workitem address and determine for each
+# work item if the target memory address is in global, private or scratch
+# memory" (RDNA2 ISA)
+# GCN-LABEL: name: hazard_lds_branch_flat
# GCN: bb.1:
+# GFX10-NEXT: S_WAITCNT_VSCNT undef $sgpr_null, 0
# GCN-NEXT: FLAT_LOAD_DWORD
---
-name: no_hazard_lds_branch_flat
+name: hazard_lds_branch_flat
body: |
bb.0:
successors: %bb.1
|
HasLds |= SIInstrInfo::isDS(MI) || SIInstrInfo::isLDSDMA(MI); | ||
HasVmem |= (SIInstrInfo::isVMEM(MI) || SIInstrInfo::isFLAT(MI)) && | ||
!SIInstrInfo::isLDSDMA(MI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have this pre-scan over the entire function in the first place? Why doesn't it just look at the current instruction like every other hazard recognizer? I also find this logic suspect, what happens with external calls or asm? Can we delete this whole thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pre-scan had been added to reduce compile time: https://reviews.llvm.org/D104219
[...]
This patch significantly improves compilation time in the cases the hazard
cannot happen. In one pathological case I looked at IsHazardInst is needlesly
called 88.6 milion times.
[...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this isn't unique, there's a maximum lookahead to catch degenerate cases. Is this one not respecting it for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's unique about this one is that the HazardFn passed into getWaitStatesSince contains a recursive call to getWaitStatesSince, which can make it ridiculously expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could probably be rewritten using the new hasHazard<StateT>
approach to avoid the recursion.
In general, "Flat instructions look at the per-workitem address and determine for each work item if the target memory address is in global, private or scratch memory." (RDNA2 ISA) That means that FLAT instructions need to be considered for VMEM hazards even without "specific segment". It should not be needed for DMA VMEM/FLAT instructions, though.
See also #137148