-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[LangRef] No target-specific size limit for atomics #136864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-ir Author: Michael Kruse (Meinersbur) ChangesAccording to the current LangRef, atomics of sizes larger than a target-dependent limit are non-conformant IR. Presumably, that size limit is To get around this, Clang's CGAtomic uses a constant "16 bytes" for the maximally supported atomic. The verifier only checks the power-of-two requirement. In a discussion with @jyknight, the intention is rather that the AtomicExpandPass will just lower everything larger than the target-supported atomic sizes to libcall (such as The original "target-specific size limit" had been added in 59b6688. The LangRef change is needed for #134455 because otherwise frontends need to pass a TargetLowering object to the helper functions just to know what the target-specific limit is. This also changes the LangRef for atomicrmw. Are there libatomic fallbacks for these? If not, LLVM-IR validity still depends on instantiating the actual backend. There are also some intrinsics such as Full diff: https://github.com/llvm/llvm-project/pull/136864.diff 1 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 9e78f317bd9ea..7a0b7f3254f41 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11206,7 +11206,7 @@ If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering
Atomic loads produce :ref:`defined <memmodel>` results when they may see
multiple atomic stores. The type of the pointee must be an integer, pointer, or
floating-point type whose bit width is a power of two greater than or equal to
-eight and less than or equal to a target-specific size limit. ``align`` must be
+eight. ``align`` must be
explicitly specified on atomic loads. Note: if the alignment is not greater or
equal to the size of the `<value>` type, the atomic operation is likely to
require a lock and have poor performance. ``!nontemporal`` does not have any
@@ -11347,7 +11347,7 @@ If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
Atomic loads produce :ref:`defined <memmodel>` results when they may see
multiple atomic stores. The type of the pointee must be an integer, pointer, or
floating-point type whose bit width is a power of two greater than or equal to
-eight and less than or equal to a target-specific size limit. ``align`` must be
+eight. ``align`` must be
explicitly specified on atomic stores. Note: if the alignment is not greater or
equal to the size of the `<value>` type, the atomic operation is likely to
require a lock and have poor performance. ``!nontemporal`` does not have any
@@ -11492,8 +11492,8 @@ There are three arguments to the '``cmpxchg``' instruction: an address
to operate on, a value to compare to the value currently be at that
address, and a new value to place at that address if the compared values
are equal. The type of '<cmp>' must be an integer or pointer type whose
-bit width is a power of two greater than or equal to eight and less
-than or equal to a target-specific size limit. '<cmp>' and '<new>' must
+bit width is a power of two greater than or equal to eight.
+'<cmp>' and '<new>' must
have the same type, and the type of '<pointer>' must be a pointer to
that type. If the ``cmpxchg`` is marked as ``volatile``, then the
optimizer is not allowed to modify the number or order of execution of
@@ -11604,8 +11604,8 @@ operation. The operation must be one of the following keywords:
- usub_sat
For most of these operations, the type of '<value>' must be an integer
-type whose bit width is a power of two greater than or equal to eight
-and less than or equal to a target-specific size limit. For xchg, this
+type whose bit width is a power of two greater than or equal to eight.
+For xchg, this
may also be a floating point or a pointer type with the same size constraints
as integers. For fadd/fsub/fmax/fmin, this must be a floating-point
or fixed vector of floating-point type. The type of the '``<pointer>``'
|
When this text was originally written, the backend didn't have support for expanding atomics to libcalls, I think. Not sure if all targets support such lowering today... does __atomic_load exist on GPU targets? |
AFAIK no GPU provides For the purpose of #134455 it doesn't matter where the crash occurs (AtomicExpandPass or emitAtomicLoad). |
So it seems like there is still a limit, at least on some targets? Maybe it's sufficient to just note that most targets support arbitrary widths using a lock-based fallback. |
The question is, what a Frontend should do. Let's say it needs a atomic of size 16:
The frontend still needs to implement libcall fallbacks for cases such as non-constant sizes (this is what #134455 is doing). If due to compiler optimizations the argument becomes constant (e.g. because it is wrapped in a utility function that should work for any size), there currently is no way back to an atomic load/store/cmpxchg instruction. So ideally AtomicExpandPass would handle non-constant sizes/ordering flags as well. Footnotes |
We have a mechanism for backend diagnostics, so they can report errors on IR they can't support. If a frontend wants to generate different code sequences depending on whether the target has atomics of a given width, and we have some targets that don't support arbitrary atomics, then yes, we're stuck with some kind of target-specific check. |
I think as far as IR and the backend is concerned, it's fine to say that atomics of arbitrary width are supported. If this results in a libatomic call and libatomic is not linked, that's a linker error and no longer LLVM's problem. The frontend will not be able to improve on this situation by producing some kind of better lowering than LLVM. Worth pointing out that amdgpu and nvptx disable all libcalls apart from the atomic libcalls. They'll still emit the usual atomic_* libcalls and not crash / produce a backend error. |
@efriedma-quic @nikic So what wording do you suggest for the LangRef? |
I think your current proposed wording is fine. |
If by prosposed wording you mean the state of the current PR which just removes the part (in contrast to item 5 in #136864 (comment)), can you approve it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please wait for a second approval.
According to the current LangRef, atomics of sizes larger than a target-dependent limit are non-conformant IR. Presumably, that size limit is
TargetLoweringBase::getMaxAtomicSizeInBitsSupported()
. As a consequence, one would not even know whether IR is valid without instantiating the Target backend.To get around this, Clang's CGAtomic uses a constant "16 bytes" for the maximally supported atomic. The verifier only checks the power-of-two requirement.
In a discussion with @jyknight, the intention is rather that the AtomicExpandPass will just lower everything larger than the target-supported atomic sizes to libcall (such as
__atomic_load
). According to this interpretation, the size limit needs only be known by the lowering and does not affect the IR specification.The original "target-specific size limit" had been added in 59b6688. The LangRef change is needed for #134455 because otherwise frontends need to pass a TargetLowering object to the helper functions just to know what the target-specific limit is.
This also changes the LangRef for atomicrmw. Are there libatomic fallbacks for these? If not, LLVM-IR validity still depends on instantiating the actual backend. There are also some intrinsics such as
llvm.memcpy.element.unordered.atomic
that have this constraint but do not change in this PR.