Skip to content

[LangRef] No target-specific size limit for atomics #136864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Meinersbur
Copy link
Member

According to the current LangRef, atomics of sizes larger than a target-dependent limit are non-conformant IR. Presumably, that size limit is TargetLoweringBase::getMaxAtomicSizeInBitsSupported(). As a consequence, one would not even know whether IR is valid without instantiating the Target backend.

To get around this, Clang's CGAtomic uses a constant "16 bytes" for the maximally supported atomic. The verifier only checks the power-of-two requirement.

In a discussion with @jyknight, the intention is rather that the AtomicExpandPass will just lower everything larger than the target-supported atomic sizes to libcall (such as __atomic_load). According to this interpretation, the size limit needs only be known by the lowering and does not affect the IR specification.

The original "target-specific size limit" had been added in 59b6688. The LangRef change is needed for #134455 because otherwise frontends need to pass a TargetLowering object to the helper functions just to know what the target-specific limit is.

This also changes the LangRef for atomicrmw. Are there libatomic fallbacks for these? If not, LLVM-IR validity still depends on instantiating the actual backend. There are also some intrinsics such as llvm.memcpy.element.unordered.atomic that have this constraint but do not change in this PR.

@Meinersbur Meinersbur marked this pull request as ready for review April 23, 2025 13:37
@llvmbot
Copy link
Member

llvmbot commented Apr 23, 2025

@llvm/pr-subscribers-llvm-ir

Author: Michael Kruse (Meinersbur)

Changes

According to the current LangRef, atomics of sizes larger than a target-dependent limit are non-conformant IR. Presumably, that size limit is TargetLoweringBase::getMaxAtomicSizeInBitsSupported(). As a consequence, one would not even know whether IR is valid without instantiating the Target backend.

To get around this, Clang's CGAtomic uses a constant "16 bytes" for the maximally supported atomic. The verifier only checks the power-of-two requirement.

In a discussion with @jyknight, the intention is rather that the AtomicExpandPass will just lower everything larger than the target-supported atomic sizes to libcall (such as __atomic_load). According to this interpretation, the size limit needs only be known by the lowering and does not affect the IR specification.

The original "target-specific size limit" had been added in 59b6688. The LangRef change is needed for #134455 because otherwise frontends need to pass a TargetLowering object to the helper functions just to know what the target-specific limit is.

This also changes the LangRef for atomicrmw. Are there libatomic fallbacks for these? If not, LLVM-IR validity still depends on instantiating the actual backend. There are also some intrinsics such as llvm.memcpy.element.unordered.atomic that have this constraint but do not change in this PR.


Full diff: https://github.com/llvm/llvm-project/pull/136864.diff

1 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+6-6)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 9e78f317bd9ea..7a0b7f3254f41 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11206,7 +11206,7 @@ If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering
 Atomic loads produce :ref:`defined <memmodel>` results when they may see
 multiple atomic stores. The type of the pointee must be an integer, pointer, or
 floating-point type whose bit width is a power of two greater than or equal to
-eight and less than or equal to a target-specific size limit.  ``align`` must be
+eight. ``align`` must be
 explicitly specified on atomic loads. Note: if the alignment is not greater or
 equal to the size of the `<value>` type, the atomic operation is likely to
 require a lock and have poor performance. ``!nontemporal`` does not have any
@@ -11347,7 +11347,7 @@ If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
 Atomic loads produce :ref:`defined <memmodel>` results when they may see
 multiple atomic stores. The type of the pointee must be an integer, pointer, or
 floating-point type whose bit width is a power of two greater than or equal to
-eight and less than or equal to a target-specific size limit.  ``align`` must be
+eight. ``align`` must be
 explicitly specified on atomic stores. Note: if the alignment is not greater or
 equal to the size of the `<value>` type, the atomic operation is likely to
 require a lock and have poor performance. ``!nontemporal`` does not have any
@@ -11492,8 +11492,8 @@ There are three arguments to the '``cmpxchg``' instruction: an address
 to operate on, a value to compare to the value currently be at that
 address, and a new value to place at that address if the compared values
 are equal. The type of '<cmp>' must be an integer or pointer type whose
-bit width is a power of two greater than or equal to eight and less
-than or equal to a target-specific size limit. '<cmp>' and '<new>' must
+bit width is a power of two greater than or equal to eight.
+'<cmp>' and '<new>' must
 have the same type, and the type of '<pointer>' must be a pointer to
 that type. If the ``cmpxchg`` is marked as ``volatile``, then the
 optimizer is not allowed to modify the number or order of execution of
@@ -11604,8 +11604,8 @@ operation. The operation must be one of the following keywords:
 -  usub_sat
 
 For most of these operations, the type of '<value>' must be an integer
-type whose bit width is a power of two greater than or equal to eight
-and less than or equal to a target-specific size limit. For xchg, this
+type whose bit width is a power of two greater than or equal to eight.
+For xchg, this
 may also be a floating point or a pointer type with the same size constraints
 as integers.  For fadd/fsub/fmax/fmin, this must be a floating-point
 or fixed vector of floating-point type.  The type of the '``<pointer>``'

@efriedma-quic
Copy link
Collaborator

When this text was originally written, the backend didn't have support for expanding atomics to libcalls, I think. Not sure if all targets support such lowering today... does __atomic_load exist on GPU targets?

@Meinersbur
Copy link
Member Author

AFAIK no GPU provides __atomic_load with arbitrary size directly. Possible could be implemented for CUDA after the Pascal-generation using cuda::atomic. Currently, if you would try to use the __atomic_load with an unsupported size, you may either get a link error or the compiler would crash.

For the purpose of #134455 it doesn't matter where the crash occurs (AtomicExpandPass or emitAtomicLoad).

@efriedma-quic
Copy link
Collaborator

So it seems like there is still a limit, at least on some targets?

Maybe it's sufficient to just note that most targets support arbitrary widths using a lock-based fallback.

@Meinersbur
Copy link
Member Author

The question is, what a Frontend should do. Let's say it needs a atomic of size 16:

  1. Unconditionally emit an LLVM-IR cmpxchg instruction. With my undertanding of the current LangRef, this is illegal IR even if AtomicExpandPass would convert the illegal IR to legal IR.
  2. Instantiate the TargetMachine and ask it for the supported atomic size. For Clang the discussion was that this would require LLVM_TARGETS_TO_BUILD to emit LLVM-IR for a specific architecture which was an unwelcome change.
  3. Implement its own target-triple dependent lookup of for each target1.
  4. If it is just "most targets" then the frontend still needs to know what those targets are. If it has to know itself, this is not that much different from the previous. If it is just a (non-normative) note then it would also still be illegal IR.
  5. A middle ground would be "If target can support atomics of arbitrary size then atomic load/store/cmpxchg can be of any size. Otherwise, the size must be at most TargetLoweringBase::getMaxAtomicSizeInBitsSupported()". If it is not supported, usually frontends cannot just simply not emit an atomic. We would prefer it it emitted an error instead of crashing or emitting illegal IR, so we still would need a list of supported target triples in the frontend...

The frontend still needs to implement libcall fallbacks for cases such as non-constant sizes (this is what #134455 is doing). If due to compiler optimizations the argument becomes constant (e.g. because it is wrapped in a utility function that should work for any size), there currently is no way back to an atomic load/store/cmpxchg instruction. So ideally AtomicExpandPass would handle non-constant sizes/ordering flags as well.

Footnotes

  1. Clang has its own TargetInfo that contains a lot where IMHO the backend should be the source-source-of-truth. For Flang this means we may need to duplicate that info once more or also instantiate TargetInfo in addition to its own TargetCharacteristics. Maybe the RFC will help.

@efriedma-quic
Copy link
Collaborator

We have a mechanism for backend diagnostics, so they can report errors on IR they can't support.

If a frontend wants to generate different code sequences depending on whether the target has atomics of a given width, and we have some targets that don't support arbitrary atomics, then yes, we're stuck with some kind of target-specific check.

@nikic
Copy link
Contributor

nikic commented Apr 28, 2025

I think as far as IR and the backend is concerned, it's fine to say that atomics of arbitrary width are supported. If this results in a libatomic call and libatomic is not linked, that's a linker error and no longer LLVM's problem. The frontend will not be able to improve on this situation by producing some kind of better lowering than LLVM.

Worth pointing out that amdgpu and nvptx disable all libcalls apart from the atomic libcalls. They'll still emit the usual atomic_* libcalls and not crash / produce a backend error.

@Meinersbur
Copy link
Member Author

@efriedma-quic @nikic So what wording do you suggest for the LangRef?

@nikic
Copy link
Contributor

nikic commented Apr 29, 2025

I think your current proposed wording is fine.

@Meinersbur Meinersbur requested a review from nikic April 29, 2025 12:56
@Meinersbur
Copy link
Member Author

I think your current proposed wording is fine.

If by prosposed wording you mean the state of the current PR which just removes the part (in contrast to item 5 in #136864 (comment)), can you approve it?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please wait for a second approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants