Skip to content

[libclang/python] Fix evaluation of the unsigned enumeration values, #108766 #108769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fursov
Copy link

@fursov fursov commented Sep 15, 2024

If the type of an enumeration is not native (e.g. uint8_t) but is unsigned then evaluation of the value might get wrong: for values bigger than half of type range the return value will be negative. This happens because for non-native enum types the TypeKind is ELABORATED. To get the real integer type, the conversion to canonical type is required before the enum_value() function can decide about type being signed or unsigned.

Fixes #108766

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:as-a-library libclang and C++ API labels Sep 15, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 15, 2024

@llvm/pr-subscribers-clang

Author: Dmitry Fursov (fursov)

Changes

If the type of an enumeration is not native (e.g. uint8_t) but is unsigned then evaluation of the value might get wrong: for values bigger than half of type range the return value will be negative. This happens because for non-native enum types the TypeKind is ELABORATED. To get the real integer type, the conversion to canonical type is required before the enum_value() function can decide about type being signed or unsigned.

This is fix for ticket #108766: #108766


Full diff: https://github.com/llvm/llvm-project/pull/108769.diff

2 Files Affected:

  • (modified) clang/bindings/python/clang/cindex.py (+2)
  • (modified) clang/bindings/python/tests/cindex/test_cursor.py (+19)
diff --git a/clang/bindings/python/clang/cindex.py b/clang/bindings/python/clang/cindex.py
index 4da99e899e7f7c..c582c82ce50789 100644
--- a/clang/bindings/python/clang/cindex.py
+++ b/clang/bindings/python/clang/cindex.py
@@ -1952,6 +1952,8 @@ def enum_value(self):
             underlying_type = self.type
             if underlying_type.kind == TypeKind.ENUM:
                 underlying_type = underlying_type.get_declaration().enum_type
+            if underlying_type.kind == TypeKind.ELABORATED:
+                underlying_type = underlying_type.get_canonical()
             if underlying_type.kind in (
                 TypeKind.CHAR_U,
                 TypeKind.UCHAR,
diff --git a/clang/bindings/python/tests/cindex/test_cursor.py b/clang/bindings/python/tests/cindex/test_cursor.py
index 7476947bde2ea6..61a7bc1414f235 100644
--- a/clang/bindings/python/tests/cindex/test_cursor.py
+++ b/clang/bindings/python/tests/cindex/test_cursor.py
@@ -570,6 +570,25 @@ def test_enum_values_cpp(self):
         self.assertEqual(ham.kind, CursorKind.ENUM_CONSTANT_DECL)
         self.assertEqual(ham.enum_value, 0x10000000000)
 
+    def test_enum_values_on_elaborated_type(self):
+        tu = get_tu(
+            "using myUType = unsigned char; enum TEST : myUType { SPAM = 1, HAM = 0xff;", lang="cpp"
+        )
+        enum = get_cursor(tu, "TEST")
+        self.assertIsNotNone(enum)
+
+        self.assertEqual(enum.kind, CursorKind.ENUM_DECL)
+
+        enum_constants = list(enum.get_children())
+        self.assertEqual(len(enum_constants), 2)
+
+        spam, ham = enum_constants
+
+        self.assertEqual(spam.kind, CursorKind.ENUM_CONSTANT_DECL)
+        self.assertEqual(spam.enum_value, 1)
+        self.assertEqual(ham.kind, CursorKind.ENUM_CONSTANT_DECL)
+        self.assertEqual(ham.enum_value, 255)
+
     def test_annotation_attribute(self):
         tu = get_tu(
             'int foo (void) __attribute__ ((annotate("here be annotation attribute")));'

Copy link

github-actions bot commented Sep 16, 2024

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Contributor

@DeinAlptraum DeinAlptraum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fix!

Except for the syntax error noted by the CI and my other comment, this looks good to me.

@@ -570,6 +570,25 @@ def test_enum_values_cpp(self):
self.assertEqual(ham.kind, CursorKind.ENUM_CONSTANT_DECL)
self.assertEqual(ham.enum_value, 0x10000000000)

def test_enum_values_on_elaborated_type(self):
tu = get_tu(
"using myUType = unsigned char; enum TEST : myUType { SPAM = 1, HAM = 0xff;", lang="cpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the closing curly brace after 0xff

If the type of an enumeration is not native (e.g. uint8_t) but
is unsigned then evaluation of the value might get wrong: for
values bigger than half of type range the return value will be
negative. This happens because for non-native enum types the TypeKind
is ELABORATED. To get the real integer type, the conversion to
canonical type is required before the enum_value() function can
decide about type being signed or unsigned.
@fursov
Copy link
Author

fursov commented Sep 18, 2024

@DeinAlptraum, thank you for the review. The findings are fixed now.
Could you say if the fix to be backported to other relelases, like 18.x or 19rc? If yes, how this is done?

Copy link
Contributor

@DeinAlptraum DeinAlptraum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest looks good to me.

Regarding releases, I'm not too familiar with the process here, but I believe backports aren't done unless it's a critical security bug or similar. @Endilll can you confirm, and do you also want to sign off on this?

@DeinAlptraum DeinAlptraum changed the title Fix evaluation of the unsigned enumeration values, #108766 [libclang/python] Fix evaluation of the unsigned enumeration values, #108766 Sep 18, 2024
Co-authored-by: Jannick Kremer <jannick.kremer@mailbox.org>
@fursov
Copy link
Author

fursov commented Sep 19, 2024

Sorry, I'm not familiar with the process (this is my first PR on github) - how the change gets now to the repo? Someone from llvm-project should push it?

@DeinAlptraum
Copy link
Contributor

@fursov yes, since you are a new contributor, you cannot merge this yourself. I can, but I am also still relatively new so I would like someone else to confirm that this change is good to go - so we're just waiting for him to take a look as well :)

@@ -1952,6 +1952,8 @@ def enum_value(self):
underlying_type = self.type
if underlying_type.kind == TypeKind.ENUM:
underlying_type = underlying_type.get_declaration().enum_type
if underlying_type.kind == TypeKind.ELABORATED:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per http://eel.is/c++draft/dcl.enum#2.sentence-4, underlying type of an enum has to be an integral type, even if grammar allows an elaborated type specifier there. I'm not sure TypeKind.ELABORATED can ever be a correct answer for an underlying type of an enumeration. Could it be that we're looking at a deeper issue here?

CC @AaronBallman @cor3ntin

Copy link
Author

@fursov fursov Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My educated guess:

the call chain looks like this: enum_type (cindex.py) -> clang_getEnumDeclIntegerType (clang/tools/libclang/CXType.cpp) -> getIntegerType (clang/include/clang/AST/Decl.h)

Let's use this example:

using TUintType = uint8_t;

enum class ETUintType : TUintType
{
    A = 0xff,
    B = 0x20
};

From what I can see, e.g. from ast dumper (clang/lib/AST/TextNodeDumper.cpp, TextNodeDumper::VisitEnumDecl), first it dumps scoped class, then name (ETUintType), then it uses function dumpType() to print the type: first, the one that has been provided by getIntegerType(), but then it is desugared.

EnumDecl 0x5580b1f62490 <line:5:1, line:9:1> line:5:12 class ETUintType 'TUintType':'unsigned char'

This leads me to the conclusion, that getIntegerType() on the AST parser on purpose returns whatever type that enum is defined against without resolving to the target canonical type.

But I would like to hear the opinion from experts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this seems to be a regression in Clang 16, not too long ago: https://godbolt.org/z/bEno8Pdhb

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the check-clang-python tests using clang 15 - it also does not resolve the enum type to the integer types. But since there was no elaborated type yet, it returns TypeKind.TYPEDEF instead (that matches the godbolt example from above - the TUintType in clang15 - typedef, in clang16 - elaborated and only then typedef).
The test like in patch above (test_enum_values_on_elaborated_type) fails same way as on "main" branch:

    self.assertEqual(ham.enum_value, 255)
AssertionError: -1 != 255

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Endilll , do you think the getIntegerType function on the EnumDecl class should return the true integer type to which any sugared type resolves?

Copy link
Contributor

@Endilll Endilll Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delayed answer, I'm on vacation at the moment.

I asked Corentin and Erich offline, they agreed that having an ElaboratedType node here is surprising, because elaborated type specifiers are not involved. Would it be the correct assessment that this is the core part of your PR? If so, I'd like to hold this off, and resolve an underlying issue instead. You don't have to do it yourself if you don't feel like it. I might get to it myself later.

Does this sound good to you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the analyses. I'm fine with your proposal. I believe, I do not have enough competence to identify the right place where the fix should be applied and I do not want to make blind guesses. So I suppose, this PR can be closed though the issue #108766 will stay open.
Depending on the fix, the enum still may get typedef as underlying type (?, as in clang15) so the fix that provides canonical type on the python code might still be needed. I can provide that if required, ping me on the issue, I can make PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:as-a-library libclang and C++ API clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libclang python binding returns incorrect enum value
5 participants