Skip to content

[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation #109391

Open
@vchuravy

Description

@vchuravy

Reduced from JuliaGPU/AMDGPU.jl#672 (comment)

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

Activity

llvmbot

llvmbot commented on Sep 20, 2024

@llvmbot
Member

@llvm/issue-subscribers-backend-amdgpu

Author: Valentin Churavy (vchuravy)

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

llvmbot

llvmbot commented on Sep 20, 2024

@llvmbot
Member

@llvm/issue-subscribers-julialang

Author: Valentin Churavy (vchuravy)

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

arsenm

arsenm commented on Sep 20, 2024

@arsenm
Contributor

Does this reproduce with main? I see no difference in 17.0 or later: https://godbolt.org/z/Es9jdoYsd

I'm guessing this is a gfx11 specific issue. Newer versions don't have the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) at the end, so this might just be one of the early deallocation bugs that was fixed.

cc @jayfoad

vchuravy

vchuravy commented on Sep 22, 2024

@vchuravy
ContributorAuthor

The original reporter had a gfx1102 and I reproduced it on a gfx1103.

As far as I can tell this doesn't reproduce on main. I hadn't had the time to bisect. This is on an LTS release for us, and I would like to Backports the fix if possible.
So if you have a hint which commit might have fixed this that would be great!

arsenm

arsenm commented on Sep 24, 2024

@arsenm
Contributor

eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)

jayfoad

jayfoad commented on Sep 24, 2024

@jayfoad
Contributor

eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)

The reason for reimplementing it was as a basis for implementing this fairly important bug fix: 4b6d41c

But from reading the description above ("the bug manifests as if we were skipping a loop") it's not clear to me how this is related to dealloc vgprs.

arsenm

arsenm commented on Sep 24, 2024

@arsenm
Contributor

it's not clear to me how this is related to dealloc vgprs.

I just noticed in the newer versions, there is no more dealloc. The only difference between the pass/fail in the 16 diff was a few register assignments and one s_delay_alu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jayfoad@arsenm@vchuravy@llvmbot

        Issue actions

          [AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation · Issue #109391 · llvm/llvm-project