Skip to content

[DataLayout] Introduce sentinel pointer value #131557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

shiltian
Copy link
Contributor

@shiltian shiltian commented Mar 17, 2025

The value of a null pointer is not always 0. For example, on AMDGPU, the null
pointer in address spaces 3 and 5 is 0xffffffff. Currently, there is no
target-independent way to get this information, making it difficult and
error-prone to handle null pointers in target-agnostic code.

We do have ConstantPointerNull, but it might be a little confusing and
misleading. It represents a pointer with an all-zero value rather than
necessarily a real nullptr.

This PR introduces the concept of a sentinel pointer value to DataLayout,
representing the actual nullptr value for a given address space. The changes
include:

  • A new interface function:

    APInt getSentinelPointerValue(unsigned AS)
    

    This function returns an APInt representing the sentinel pointer value for
    the given address space AS. An APInt is used instead of a literal integer
    to support cases where pointers are wider than 64 bits (e.g., AMDGPU’s address
    space 8).

  • An extension to the data layout string format:

    p[n]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]
    

    The new <sentinel> component specifies the sentinel value for the
    corresponding pointer. It currently supports two values:

    • z for an all-zero value
    • f for a full-bit set value

    These two values are the most common representations of nullptr. It is
    unlikely that any target would define nullptr as a random value.

A follow-up patch series will introduce an equivalent of ConstantPointerNull
that represents the actual nullptr, built on top of this PR.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" llvm:ir labels Mar 17, 2025
Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Mar 17, 2025

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-ir

Author: Shilei Tian (shiltian)

Changes

The value of a null pointer is not always 0. For example, on AMDGPU, the null
pointer in address spaces 3 and 5 is 0xffffffff. Currently, there is no
target-independent way to get this information, making it difficult and
error-prone to handle null pointers in target-agnostic code.

We do have ConstantPointerNull, but it might be a little confusing and
misleading. It represents a pointer with an all-zero value rather than
necessarily a real nullptr.

This PR introduces the concept of a sentinel pointer value to DataLayout,
representing the actual nullptr value for a given address space. The changes
include:

  • A new interface function:

    APInt getSentinelPointerValue(unsigned AS)
    

    This function returns an APInt representing the sentinel pointer value for
    the given address space AS. An APInt is used instead of a literal integer
    to support cases where pointers are wider than 64 bits (e.g., AMDGPU’s address
    space 8).

  • An extension to the data layout string format:

    p[n]:&lt;size&gt;:&lt;abi&gt;[:&lt;pref&gt;[:&lt;idx&gt;[:&lt;sentinel&gt;]]]
    

    The new &lt;sentinel&gt; component specifies the sentinel value for the
    corresponding pointer. It currently supports two values:

    • 0 for an all-zero value
    • f for a full-bit set value

    These two values are the most common representations of nullptr. It is
    unlikely that any target would define nullptr as a random value.

A follow-up patch series will introduce an equivalent of ConstantPointerNull
that represents the actual nullptr, built on top of this PR.


Full diff: https://github.com/llvm/llvm-project/pull/131557.diff

5 Files Affected:

  • (modified) clang/lib/Basic/Targets/AMDGPU.cpp (+3-3)
  • (modified) llvm/docs/LangRef.rst (+7-4)
  • (modified) llvm/include/llvm/IR/DataLayout.h (+7-1)
  • (modified) llvm/lib/IR/DataLayout.cpp (+44-10)
  • (modified) llvm/unittests/IR/DataLayoutTest.cpp (+17-3)
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index a42b4589fb5ac..1506387a41d96 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -32,9 +32,9 @@ static const char *const DataLayoutStringR600 =
     "-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1";
 
 static const char *const DataLayoutStringAMDGCN =
-    "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"
-    "-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:"
-    "32-v48:64-v96:128"
+    "e-p:64:64-p1:64:64-p2:32:32-p3:32:32:32:32:f-p4:64:64-p5:32:32:32:32:f"
+    "-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16"
+    "-v24:32-v32:32-v48:64-v96:128"
     "-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1"
     "-ni:7:8:9";
 
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index c0567090fdd2a..a22020da38f83 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3134,7 +3134,7 @@ as follows:
 ``A<address space>``
     Specifies the address space of objects created by '``alloca``'.
     Defaults to the default address space of 0.
-``p[n]:<size>:<abi>[:<pref>][:<idx>]``
+``p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]``
     This specifies the *size* of a pointer and its ``<abi>`` and
     ``<pref>``\erred alignments for address space ``n``. ``<pref>`` is optional
     and defaults to ``<abi>``. The fourth parameter ``<idx>`` is the size of the
@@ -3143,7 +3143,10 @@ as follows:
     specified, the default index size is equal to the pointer size. All sizes
     are in bits. The address space, ``n``, is optional, and if not specified,
     denotes the default address space 0. The value of ``n`` must be
-    in the range [1,2^24).
+    in the range [1,2^24). The fifth parameter ``<sentinel>`` specifies the
+    sentinel value of the pointer for the corresponding address space. It
+    currently accepts two values: ``0`` for an all-zero value and ``f`` for a
+    full-bit set value. The default sentinel pointer value is all-zero.
 ``i<size>:<abi>[:<pref>]``
     This specifies the alignment for an integer type of a given bit
     ``<size>``. The value of ``<size>`` must be in the range [1,2^24).
@@ -13045,9 +13048,9 @@ This instruction requires several arguments:
    -  Caller and callee both have the calling convention ``fastcc`` or ``tailcc``.
    -  The call is in tail position (ret immediately follows call and ret
       uses value of call or is void).
-   -  Option ``-tailcallopt`` is enabled, ``llvm::GuaranteedTailCallOpt`` is 
+   -  Option ``-tailcallopt`` is enabled, ``llvm::GuaranteedTailCallOpt`` is
       ``true``, or the calling convention is ``tailcc``.
-   -  `Platform-specific constraints are met. 
+   -  `Platform-specific constraints are met.
       <CodeGenerator.html#tail-call-optimization>`_
 
 #. The optional ``notail`` marker indicates that the optimizers should not add
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 2ad080e6d0cd2..1257ffb32a4df 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -78,6 +78,7 @@ class DataLayout {
     Align ABIAlign;
     Align PrefAlign;
     uint32_t IndexBitWidth;
+    APInt SentinelValue;
     /// Pointers in this address space don't have a well-defined bitwise
     /// representation (e.g. may be relocated by a copying garbage collector).
     /// Additionally, they may also be non-integral (i.e. containing additional
@@ -148,7 +149,7 @@ class DataLayout {
   /// Sets or updates the specification for pointer in the given address space.
   void setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth, Align ABIAlign,
                       Align PrefAlign, uint32_t IndexBitWidth,
-                      bool IsNonIntegral);
+                      APInt SentinelValue, bool IsNonIntegral);
 
   /// Internal helper to get alignment for integer of given bitwidth.
   Align getIntegerAlignment(uint32_t BitWidth, bool abi_or_pref) const;
@@ -552,6 +553,11 @@ class DataLayout {
   ///
   /// This includes an explicitly requested alignment (if the global has one).
   Align getPreferredAlign(const GlobalVariable *GV) const;
+
+  /// Returns the sentinel pointer value for a given address space. If the
+  /// address space is invalid, it defaults to the sentinel pointer value of
+  /// address space 0, aligning with the behavior of \p getPointerSpec.
+  APInt getSentinelPointerValue(unsigned AS) const;
 };
 
 inline DataLayout *unwrap(LLVMTargetDataRef P) {
diff --git a/llvm/lib/IR/DataLayout.cpp b/llvm/lib/IR/DataLayout.cpp
index 0cf0bfc9702d3..058775d68465b 100644
--- a/llvm/lib/IR/DataLayout.cpp
+++ b/llvm/lib/IR/DataLayout.cpp
@@ -206,9 +206,10 @@ constexpr DataLayout::PrimitiveSpec DefaultVectorSpecs[] = {
 };
 
 // Default pointer type specifications.
-constexpr DataLayout::PointerSpec DefaultPointerSpecs[] = {
-    // p0:64:64:64:64
-    {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, false},
+const DataLayout::PointerSpec DefaultPointerSpecs[] = {
+    // p0:64:64:64:64:0
+    {0, 64, Align::Constant<8>(), Align::Constant<8>(), 64, APInt(64, 0),
+     false},
 };
 
 DataLayout::DataLayout()
@@ -296,6 +297,22 @@ static Error parseSize(StringRef Str, unsigned &BitWidth,
   return Error::success();
 }
 
+static Error parseSentinelValue(StringRef Str, APInt &V) {
+  if (Str.empty())
+    return createStringError("sentinel value component cannot be empty");
+  if (Str.size() != 1)
+    return createStringError("sentinel value component must be a '0' or 'f'");
+  if (Str[0] == '0') {
+    V.clearAllBits();
+    return Error::success();
+  }
+  if (Str[0] == 'f') {
+    V.setAllBits();
+    return Error::success();
+  }
+  return createStringError("sentinel value component must be a '0' or 'f'");
+}
+
 /// Attempts to parse an alignment component of a specification.
 ///
 /// On success, returns the value converted to byte amount in \p Alignment.
@@ -409,13 +426,14 @@ Error DataLayout::parseAggregateSpec(StringRef Spec) {
 }
 
 Error DataLayout::parsePointerSpec(StringRef Spec) {
-  // p[<n>]:<size>:<abi>[:<pref>[:<idx>]]
+  // p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]
   SmallVector<StringRef, 5> Components;
   assert(Spec.front() == 'p');
   Spec.drop_front().split(Components, ':');
 
-  if (Components.size() < 3 || Components.size() > 5)
-    return createSpecFormatError("p[<n>]:<size>:<abi>[:<pref>[:<idx>]]");
+  if (Components.size() < 3 || Components.size() > 6)
+    return createSpecFormatError(
+        "p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]");
 
   // Address space. Optional, defaults to 0.
   unsigned AddrSpace = 0;
@@ -454,8 +472,14 @@ Error DataLayout::parsePointerSpec(StringRef Spec) {
     return createStringError(
         "index size cannot be larger than the pointer size");
 
+  APInt SentinelValue(BitWidth, 0);
+  if (Components.size() > 5) {
+    if (Error Err = parseSentinelValue(Components[5], SentinelValue))
+      return Err;
+  }
+
   setPointerSpec(AddrSpace, BitWidth, ABIAlign, PrefAlign, IndexBitWidth,
-                 false);
+                 SentinelValue, /*IsNonIntegral=*/false);
   return Error::success();
 }
 
@@ -631,7 +655,7 @@ Error DataLayout::parseLayoutString(StringRef LayoutString) {
     // the spec for AS0, and we then update that to mark it non-integral.
     const PointerSpec &PS = getPointerSpec(AS);
     setPointerSpec(AS, PS.BitWidth, PS.ABIAlign, PS.PrefAlign, PS.IndexBitWidth,
-                   true);
+                   PS.SentinelValue, /*IsNonIntegral=*/true);
   }
 
   return Error::success();
@@ -679,16 +703,19 @@ DataLayout::getPointerSpec(uint32_t AddrSpace) const {
 
 void DataLayout::setPointerSpec(uint32_t AddrSpace, uint32_t BitWidth,
                                 Align ABIAlign, Align PrefAlign,
-                                uint32_t IndexBitWidth, bool IsNonIntegral) {
+                                uint32_t IndexBitWidth, APInt SentinelValue,
+                                bool IsNonIntegral) {
   auto I = lower_bound(PointerSpecs, AddrSpace, LessPointerAddrSpace());
   if (I == PointerSpecs.end() || I->AddrSpace != AddrSpace) {
     PointerSpecs.insert(I, PointerSpec{AddrSpace, BitWidth, ABIAlign, PrefAlign,
-                                       IndexBitWidth, IsNonIntegral});
+                                       IndexBitWidth, SentinelValue,
+                                       IsNonIntegral});
   } else {
     I->BitWidth = BitWidth;
     I->ABIAlign = ABIAlign;
     I->PrefAlign = PrefAlign;
     I->IndexBitWidth = IndexBitWidth;
+    I->SentinelValue = SentinelValue;
     I->IsNonIntegral = IsNonIntegral;
   }
 }
@@ -1020,3 +1047,10 @@ Align DataLayout::getPreferredAlign(const GlobalVariable *GV) const {
   }
   return Alignment;
 }
+
+APInt DataLayout::getSentinelPointerValue(unsigned AS) const {
+  auto I = lower_bound(PointerSpecs, AS, LessPointerAddrSpace());
+  if (I != PointerSpecs.end() || I->AddrSpace == AS)
+    return I->SentinelValue;
+  return PointerSpecs[0].SentinelValue;
+}
diff --git a/llvm/unittests/IR/DataLayoutTest.cpp b/llvm/unittests/IR/DataLayoutTest.cpp
index 16a603ff6416f..79d46ebfd7d8e 100644
--- a/llvm/unittests/IR/DataLayoutTest.cpp
+++ b/llvm/unittests/IR/DataLayoutTest.cpp
@@ -313,11 +313,12 @@ TEST(DataLayout, ParsePointerSpec) {
     EXPECT_THAT_EXPECTED(DataLayout::parse(Str), Succeeded());
 
   for (StringRef Str :
-       {"p", "p0", "p:32", "p0:32", "p:32:32:32:32:32", "p0:32:32:32:32:32"})
+       {"p", "p0", "p:32", "p0:32", "p:32:32:32:32:32:0", "p0:32:32:32:32:32:0"})
     EXPECT_THAT_EXPECTED(
         DataLayout::parse(Str),
-        FailedWithMessage("malformed specification, must be of the form "
-                          "\"p[<n>]:<size>:<abi>[:<pref>[:<idx>]]\""));
+        FailedWithMessage(
+            "malformed specification, must be of the form "
+            "\"p[<n>]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]\""));
 
   // address space
   for (StringRef Str : {"p0x0:32:32", "px:32:32:32", "p16777216:32:32:32:32"})
@@ -401,6 +402,19 @@ TEST(DataLayout, ParsePointerSpec) {
     EXPECT_THAT_EXPECTED(
         DataLayout::parse(Str),
         FailedWithMessage("index size cannot be larger than the pointer size"));
+
+  // sentinel value
+  for (StringRef Str :
+       {"p:32:32:32:32:a", "p0:32:32:32:32:ab", "p42:32:32:32:32:123"})
+    EXPECT_THAT_EXPECTED(
+        DataLayout::parse(Str),
+        FailedWithMessage("sentinel value component must be a '0' or 'f'"));
+
+  for (StringRef Str :
+       {"p:32:32:32:32:", "p0:32:32:32:32:", "p42:32:32:32:32:"})
+    EXPECT_THAT_EXPECTED(
+        DataLayout::parse(Str),
+        FailedWithMessage("sentinel value component cannot be empty"));
 }
 
 TEST(DataLayoutTest, ParseNativeIntegersSpec) {

@shiltian
Copy link
Contributor Author

Copy link

github-actions bot commented Mar 17, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 3757ecf5f16c0d9b8cbfc1c9d41965df537d43e6 55227d747544a2ccc5c981cff879063f9cebc253 --extensions c,cpp,h -- clang/lib/Basic/Targets/AMDGPU.cpp clang/test/CodeGen/target-data.c llvm/include/llvm/IR/DataLayout.h llvm/lib/IR/DataLayout.cpp llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp llvm/unittests/IR/DataLayoutTest.cpp
View the diff from clang-format here.
diff --git a/llvm/unittests/IR/DataLayoutTest.cpp b/llvm/unittests/IR/DataLayoutTest.cpp
index 090baac4b6..f01f37b20d 100644
--- a/llvm/unittests/IR/DataLayoutTest.cpp
+++ b/llvm/unittests/IR/DataLayoutTest.cpp
@@ -312,8 +312,8 @@ TEST(DataLayout, ParsePointerSpec) {
         "p16777215:32:32:64:8", "p16777215:16777215:32768:32768:16777215"})
     EXPECT_THAT_EXPECTED(DataLayout::parse(Str), Succeeded());
 
-  for (StringRef Str :
-       {"p", "p0", "p:32", "p0:32", "p:32:32:32:32:32:0", "p0:32:32:32:32:32:0"})
+  for (StringRef Str : {"p", "p0", "p:32", "p0:32", "p:32:32:32:32:32:0",
+                        "p0:32:32:32:32:32:0"})
     EXPECT_THAT_EXPECTED(
         DataLayout::parse(Str),
         FailedWithMessage(

@shiltian shiltian force-pushed the users/shiltian/data-layout-sentinel-pointer branch from c966236 to 86cd48c Compare March 17, 2025 03:51
@shiltian shiltian requested review from ranapratap55 and removed request for agozillon March 17, 2025 03:51
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an RFC. For reference a previous attempt was at #83109

Comment on lines 3146 to 3149
in the range [1,2^24). The fifth parameter ``<sentinel>`` specifies the
sentinel value of the pointer for the corresponding address space. It
currently accepts two values: ``0`` for an all-zero value and ``f`` for a
full-bit set value. The default sentinel pointer value is all-zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no sentinel pointer" should also be a valid option. We also should be able to specify the default for unlisted address spaces (I thought the old PR had that part implemented?)

I'm also not sure if we should just allow an arbitrary bit pattern here.

This also should define what the sentinel pointer means. Also clarify how this interacts with null_pointer_is_valid and the various flavors of nonnull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also should be able to specify the default for unlisted address spaces (I thought the old PR had that part implemented?)

The implementation has it set to 0 at the moment by default if nothing is specified. I will mention this here.

I'm also not sure if we should just allow an arbitrary bit pattern here.

We could extend it in the future IMHO. There is nothing practical in the upstream use. There is indeed one comment used in downstream in the RFC mentioned arbitrary bit for nullptr.

This also should define what the sentinel pointer means. Also clarify how this interacts with null_pointer_is_valid and the various flavors of nonnull

I'll resolve them next.

@arsenm arsenm requested a review from krzysz00 March 17, 2025 04:27
@shiltian
Copy link
Contributor Author

shiltian commented Mar 17, 2025

This needs an RFC. For reference a previous attempt was at #83109

The RFC was posted, as mentioned in a previous comment.

I discussed offline with the author of the previous attempt and took it over.

@shiltian shiltian force-pushed the users/shiltian/data-layout-sentinel-pointer branch 2 times, most recently from bc7c7a3 to c3f38d0 Compare March 17, 2025 16:55
@shiltian shiltian force-pushed the users/shiltian/data-layout-sentinel-pointer branch 3 times, most recently from efda127 to 015964e Compare March 17, 2025 17:52
The value of a null pointer is not always `0`. For example, on AMDGPU, the null
pointer in address spaces 3 and 5 is `0xffffffff`. Currently, there is no
target-independent way to get this information, making it difficult and
error-prone to handle null pointers in target-agnostic code.

We do have `ConstantPointerNull`, but it might be a little confusing and
misleading. It represents a pointer with an all-zero value rather than
necessarily a real `nullptr`.

This PR introduces the concept of a *sentinel pointer value* to `DataLayout`,
representing the actual `nullptr` value for a given address space. The changes
include:

- A new interface function:
  ```
  APInt getSentinelPointerValue(unsigned AS)
  ```
  This function returns an `APInt` representing the sentinel pointer value for
  the given address space `AS`. An `APInt` is used instead of a literal integer
  to support cases where pointers are wider than 64 bits (e.g., AMDGPU’s address
  space 8).

- An extension to the data layout string format:
  ```
  p[n]:<size>:<abi>[:<pref>[:<idx>[:<sentinel>]]]
  ```
  The new `<sentinel>` component specifies the sentinel value for the
  corresponding pointer. It currently supports two values:
    - `0` for an all-zero value
    - `f` for a full-bit set value

  These two values are the most common representations of `nullptr`. It is
  unlikely that any target would define `nullptr` as a random value.

A follow-up patch series will introduce an equivalent of `ConstantPointerNull`
that represents the actual `nullptr`, built on top of this PR.
@shiltian shiltian force-pushed the users/shiltian/data-layout-sentinel-pointer branch from 015964e to 55227d7 Compare March 21, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category lld:ELF lld llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants