Open
Description
Reduced from JuliaGPU/AMDGPU.jl#672 (comment)
The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.
I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.
https://godbolt.org/z/sscdTK7d7
https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399
llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S
broken.ll
is the original file that is exhibiting the miscompiation and broken.reorder.ll
is the file that emits the same code as the working MWE.
Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.
Activity
llvmbot commentedon Sep 20, 2024
@llvm/issue-subscribers-backend-amdgpu
Author: Valentin Churavy (vchuravy)
The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.
I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.
https://godbolt.org/z/sscdTK7d7
https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399
broken.ll
is the original file that is exhibiting the miscompiation andbroken.reorder.ll
is the file that emits the same code as the working MWE.Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.
llvmbot commentedon Sep 20, 2024
@llvm/issue-subscribers-julialang
Author: Valentin Churavy (vchuravy)
The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.
I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.
https://godbolt.org/z/sscdTK7d7
https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399
broken.ll
is the original file that is exhibiting the miscompiation andbroken.reorder.ll
is the file that emits the same code as the working MWE.Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.
arsenm commentedon Sep 20, 2024
Does this reproduce with main? I see no difference in 17.0 or later: https://godbolt.org/z/Es9jdoYsd
I'm guessing this is a gfx11 specific issue. Newer versions don't have the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) at the end, so this might just be one of the early deallocation bugs that was fixed.
cc @jayfoad
vchuravy commentedon Sep 22, 2024
The original reporter had a gfx1102 and I reproduced it on a gfx1103.
As far as I can tell this doesn't reproduce on main. I hadn't had the time to bisect. This is on an LTS release for us, and I would like to Backports the fix if possible.
So if you have a hint which commit might have fixed this that would be great!
arsenm commentedon Sep 24, 2024
eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)
jayfoad commentedon Sep 24, 2024
The reason for reimplementing it was as a basis for implementing this fairly important bug fix: 4b6d41c
But from reading the description above ("the bug manifests as if we were skipping a loop") it's not clear to me how this is related to dealloc vgprs.
arsenm commentedon Sep 24, 2024
I just noticed in the newer versions, there is no more dealloc. The only difference between the pass/fail in the 16 diff was a few register assignments and one s_delay_alu