Skip to content

Diagnose potential size confusion with VLA params #129772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AaronBallman
Copy link
Collaborator

With Clang starting to support bounds safety checking annotations, we've begun to dabble in late parsing of function parameters so that an earlier parameter can refer to a later parameter by name. However, one situation where this can cause a problem is if the earlier parameter is actually referencing a variable from an outer scope.

To help identify those situations, this introduces a new diagnostic -Wvla-potential-size-confusion (grouped under -Wvla). This diagnoses code like:

int n;
void func(int array[n], int n);

to let the user know that the 'n' in 'array[n]' references the file scope variable and not the parameter.

With Clang starting to support bounds safety checking annotations,
we've begun to dabble in late parsing of function parameters so that
an earlier parameter can refer to a later parameter by name. However,
one situation where this can cause a problem is if the earlier
parameter is actually referencing a variable from an outer scope.

To help identify those situations, this introduces a new diagnostic
-Wvla-potential-size-confusion (grouped under -Wvla). This diagnoses
code like:

  int n;
  void func(int array[n], int n);

to let the user know that the 'n' in 'array[n]' references the file
scope variable and not the parameter.
@AaronBallman AaronBallman added c clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer labels Mar 4, 2025
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Mar 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 4, 2025

@llvm/pr-subscribers-clang

Author: Aaron Ballman (AaronBallman)

Changes

With Clang starting to support bounds safety checking annotations, we've begun to dabble in late parsing of function parameters so that an earlier parameter can refer to a later parameter by name. However, one situation where this can cause a problem is if the earlier parameter is actually referencing a variable from an outer scope.

To help identify those situations, this introduces a new diagnostic -Wvla-potential-size-confusion (grouped under -Wvla). This diagnoses code like:

int n;
void func(int array[n], int n);

to let the user know that the 'n' in 'array[n]' references the file scope variable and not the parameter.


Full diff: https://github.com/llvm/llvm-project/pull/129772.diff

5 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+5)
  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+2-1)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+7)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+70)
  • (added) clang/test/Sema/vla-potential-size-confusion.c (+68)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 97b8e32f03b57..5417fd65dd893 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -222,6 +222,11 @@ Improvements to Clang's diagnostics
 
 - Improve the diagnostics for shadows template parameter to report correct location (#GH129060).
 
+- Added the ``-Wvla-potential-size-confusion`` diagnostic, which is grouped
+  under ``-Wvla`` to diagnose when a variably-modified type in a function
+  parameter list is using a variable from an outer scope as opposed to a
+  variable declared later in the parameter list.
+
 Improvements to Clang's time-trace
 ----------------------------------
 
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index fac80fb4009aa..1992c6a424e9e 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -936,7 +936,8 @@ def VexingParse : DiagGroup<"vexing-parse">;
 def VLAUseStaticAssert : DiagGroup<"vla-extension-static-assert">;
 def VLACxxExtension : DiagGroup<"vla-cxx-extension", [VLAUseStaticAssert]>;
 def VLAExtension : DiagGroup<"vla-extension", [VLACxxExtension]>;
-def VLA : DiagGroup<"vla", [VLAExtension]>;
+def VLASizeConfusion : DiagGroup<"vla-potential-size-confusion">;
+def VLA : DiagGroup<"vla", [VLAExtension, VLASizeConfusion]>;
 def VolatileRegisterVar : DiagGroup<"volatile-register-var">;
 def Visibility : DiagGroup<"visibility">;
 def ZeroLengthArray : DiagGroup<"zero-length-array">;
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index d89648a8a2e83..616986cdf5215 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -170,6 +170,13 @@ def err_vla_in_coroutine_unsupported : Error<
   "variable length arrays in a coroutine are not supported">;
 def note_vla_unsupported : Note<
   "variable length arrays are not supported for the current target">;
+def warn_vla_size_expr_shadow : Warning<
+  "variable length array size expression refers to declaration from an outer "
+  "scope">, InGroup<VLASizeConfusion>;
+def note_vla_size_expr_shadow_param : Note<
+  "does not refer to this declaration">;
+def note_vla_size_expr_shadow_actual : Note<
+  "refers to this declaration instead">;
 
 // C99 variably modified types
 def err_variably_modified_template_arg : Error<
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index fe313c62ff846..ad0f8d9101a30 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -26,6 +26,7 @@
 #include "clang/AST/MangleNumberingContext.h"
 #include "clang/AST/NonTrivialTypeVisitor.h"
 #include "clang/AST/Randstruct.h"
+#include "clang/AST/RecursiveASTVisitor.h"
 #include "clang/AST/StmtCXX.h"
 #include "clang/AST/Type.h"
 #include "clang/Basic/Builtins.h"
@@ -10333,6 +10334,75 @@ Sema::ActOnFunctionDeclarator(Scope *S, Declarator &D, DeclContext *DC,
       }
     }
 
+    // Loop over the parameters to see if any of the size expressions contains
+    // a DeclRefExpr which refers to a variable from an outer scope that is
+    // also named later in the parameter list.
+    // e.g., int n; void func(int array[n], int n);
+    SmallVector<const DeclRefExpr *, 2> DRESizeExprs;
+    llvm::for_each(Params, [&](const ParmVarDecl *Param) {
+      // If we have any size expressions we need to check against, check them
+      // now.
+      for (const auto *DRE : DRESizeExprs) {
+        // Check to see if this parameter has the same name as one of the
+        // DeclRefExprs we wanted to test against. If so, then we found a
+        // situation where an earlier parameter refers to the name of a later
+        // parameter, which is (currently) only valid if there's a variable
+        // from an outer scope with the same name.
+        if (const auto *SizeExprND = dyn_cast<NamedDecl>(DRE->getDecl())) {
+          if (SizeExprND->getIdentifier() == Param->getIdentifier()) {
+            // Diagnose the DeclRefExpr from the parameter with the size
+            // expression.
+            Diag(DRE->getLocation(), diag::warn_vla_size_expr_shadow);
+            // Note the parameter that a user could be confused into thinking
+            // they're referring to.
+            Diag(Param->getLocation(), diag::note_vla_size_expr_shadow_param);
+            // Note the DeclRefExpr that's actually being used.
+            Diag(DRE->getDecl()->getLocation(),
+                 diag::note_vla_size_expr_shadow_actual);
+            
+          }
+        }
+      }
+
+      // To check whether its size expression is a simple DeclRefExpr, we first
+      // have to walk through pointers or references, but array types always
+      // decay to a pointer, so skip if this is a DecayedType.
+      QualType QT = Param->getType();
+      while (!isa<DecayedType>(QT.getTypePtr()) &&
+             (QT->isPointerType() || QT->isReferenceType()))
+        QT = QT->getPointeeType();
+
+      // An array type is always decayed to a pointer, so we need to get the
+      // original type in that case.
+      if (const auto *DT = QT->getAs<DecayedType>())
+        QT = DT->getOriginalType();
+
+      // Now we can see if it's a VLA type with a size expression.
+      // FIXME: it would be nice to handle constant-sized arrays as well,
+      // e.g., constexpr int n = 12; void foo(int array[n], int n);
+      // however, the constant expression is replaced by its value at the time
+      // we form the type, so we've lost that information here.
+      if (!QT->hasSizedVLAType())
+        return;
+
+      const VariableArrayType *VAT = getASTContext().getAsVariableArrayType(QT);
+      if (!VAT)
+        return;
+
+      class DeclRefFinder : public RecursiveASTVisitor<DeclRefFinder> {
+        SmallVectorImpl<const DeclRefExpr *> &Found;
+
+      public:
+        DeclRefFinder(SmallVectorImpl<const DeclRefExpr *> &Found)
+            : Found(Found) {}
+        bool VisitDeclRefExpr(const DeclRefExpr *DRE) {
+          Found.push_back(DRE);
+          return true;
+        }
+      } Finder(DRESizeExprs);
+      Finder.TraverseStmt(VAT->getSizeExpr());
+    });
+
     if (!getLangOpts().CPlusPlus) {
       // In C, find all the tag declarations from the prototype and move them
       // into the function DeclContext. Remove them from the surrounding tag
diff --git a/clang/test/Sema/vla-potential-size-confusion.c b/clang/test/Sema/vla-potential-size-confusion.c
new file mode 100644
index 0000000000000..de9560483e5f3
--- /dev/null
+++ b/clang/test/Sema/vla-potential-size-confusion.c
@@ -0,0 +1,68 @@
+// RUN: %clang_cc1 %s -std=c23 -verify=expected,c -fsyntax-only
+// RUN: %clang_cc1 %s -std=c23 -verify=good -fsyntax-only -Wno-vla
+// RUN: %clang_cc1 -x c++ %s -verify -fsyntax-only
+// RUN: %clang_cc1 -DCARET -fsyntax-only -std=c23 -fno-diagnostics-show-line-numbers -fcaret-diagnostics-max-lines=1 %s 2>&1 | FileCheck %s -strict-whitespace
+
+// good-no-diagnostics
+
+int n, m;      // #decl
+int size(int);
+
+void foo(int vla[n], int n); // expected-warning {{variable length array size expression refers to declaration from an outer scope}} \
+                                expected-note {{does not refer to this declaration}} \
+                                expected-note@#decl {{refers to this declaration instead}}
+
+void bar(int (*vla)[n], int n); // expected-warning {{variable length array size expression refers to declaration from an outer scope}} \
+                                   expected-note {{does not refer to this declaration}} \
+                                   expected-note@#decl {{refers to this declaration instead}}
+
+void baz(int n, int vla[n]); // no diagnostic expected
+
+void quux(int vla[n + 12], int n); // expected-warning {{variable length array size expression refers to declaration from an outer scope}} \
+                                      expected-note {{does not refer to this declaration}} \
+                                      expected-note@#decl {{refers to this declaration instead}}
+
+void quibble(int vla[size(n)], int n);  // expected-warning {{variable length array size expression refers to declaration from an outer scope}} \
+                                           expected-note {{does not refer to this declaration}} \
+                                           expected-note@#decl {{refers to this declaration instead}}
+
+void quobble(int vla[n + m], int n, int m);  // expected-warning 2 {{variable length array size expression refers to declaration from an outer scope}} \
+                                                expected-note 2 {{does not refer to this declaration}} \
+                                                expected-note@#decl 2 {{refers to this declaration instead}}
+
+// For const int, we still treat the function as having a variably-modified
+// type, but only in C.
+const int x = 12; // #other-decl
+void quorble(int vla[x], int x); // c-warning {{variable length array size expression refers to declaration from an outer scope}} \
+                                    c-note {{does not refer to this declaration}} \
+                                    c-note@#other-decl {{refers to this declaration instead}}
+
+// For constexpr int, the function has a constant array type. It would be nice
+// to diagnose this case as well, but the type system replaces the expression
+// with the constant value, and so the information about the name of the
+// variable used in the size expression is lost.
+constexpr int y = 12;
+void quuble(int vla[y], int y); // no diagnostic expected
+
+#ifdef __cplusplus
+struct S {
+  static int v; // #mem-var
+  
+  void member_function(int vla[v], int v);  // expected-warning {{variable length array size expression refers to declaration from an outer scope}} \
+                                               expected-note {{does not refer to this declaration}} \
+                                               expected-note@#mem-var {{refers to this declaration instead}}
+};
+#endif
+
+#ifdef CARET
+// Test that caret locations make sense.
+int w;
+void quable(int vla[w], int w);
+
+// CHECK: void quable(int vla[w], int w);
+// CHECK:                     ^
+// CHECK: void quable(int vla[w], int w);
+// CHECK:                             ^
+// CHECK: int w;
+// CHECK:     ^
+#endif

@AaronBallman
Copy link
Collaborator Author

CC @uecker who may also have opinions on the value of this diagnostic.

Copy link

github-actions bot commented Mar 4, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@AaronBallman
Copy link
Collaborator Author

Future improvements that would be nice to have are:

  • Diagnosing a similar situation in structures. e.g.,
    int n;
    struct S {
      int n;
      int array[sizeof(n)]; // Refers to outer n, not member n
    };
    
  • Diagnosing with constant-size arrays (requires tracking the expression for the constant-size array in the QualType) e.g.,
    constexpr int n = 12;
    void func(int array[n], int n);
    

@uecker
Copy link

uecker commented Mar 4, 2025

This warning may be useful for people who are confused about C's scoping and name lookup rules, although I would argue that you create a lot of this confusing by insisting to add the late parsing semantics that are incoherent with how C works. In C++ (where the rules are already a mess because you have different lookup rules in member functions), GCC has an error for the case that is UB. You may want to fix this first.

Another problem to think about is that people will want to pass the size via a macro argument for the structure case. In this case you should certainly suppress the warning.

@uecker
Copy link

uecker commented Mar 4, 2025

Also: Are you planning to add warning also for
constexpr int n = 1;
{
char buf[n];
constexpr int n = 2;
}
Once you plant the idea that names should be looked up in the same syntactic construct into peoples head, they will likely also get confused about this.
(Here is the case about the C++ cases which is UB and clang does not warn: https://godbolt.org/z/85K87nEac)

@rapidsna rapidsna added the clang:bounds-safety Issue/PR relating to the experimental -fbounds-safety feature in Clang label Mar 4, 2025
expected-note@#decl 2 {{refers to this declaration instead}}

// For const int, we still treat the function as having a variably-modified
// type, but only in C.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it is implied but maybe it is worth saying in C++ this is usable in a constant expression.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is proposal in flight for C that might change this.


#ifdef __cplusplus
struct S {
static int v; // #mem-var
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have the constant expression case for this example like we do for the non-member function case above? More tests the merrier.

Copy link
Contributor

@rapidsna rapidsna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this diagnostics! I just filed an issue to extend the diagnostics for cases you mentioned and also for bounds annotations: #129783

I may try to back port in some downstream version and extend for bounds annotations to see how many locations it fire.

@uecker
Copy link

uecker commented Mar 5, 2025

And a final comment. While the main motivating use case for you may be VLAs in parameter declarations (although as heavy user of those, I never felt the need for such a warning), it would seem the warning would be relevant in cases where the arrays are not actually VLAs. In general, it seems a warning about shadowing a visible identifier somewhere and just having it used with the old definition. So I wonder whether it should be renamed?

if (!VAT)
return;

class DeclRefFinder : public RecursiveASTVisitor<DeclRefFinder> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use the dynamic ast visitor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this should probably just inherit from ConstDynamicRecursiveASTVisitor instead.

(That also reminds me that I need to get back to that because we still have some unmigrated visitors left...)

@@ -10333,6 +10334,74 @@ Sema::ActOnFunctionDeclarator(Scope *S, Declarator &D, DeclContext *DC,
}
}

// Loop over the parameters to see if any of the size expressions contains
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we

  • Put all of that in a separate function
  • Maybe not run it in modes VLAs are not supported

Comment on lines +10351 to +10352
if (const auto *SizeExprND = dyn_cast<NamedDecl>(DRE->getDecl())) {
if (SizeExprND->getIdentifier() == Param->getIdentifier()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (const auto *SizeExprND = dyn_cast<NamedDecl>(DRE->getDecl())) {
if (SizeExprND->getIdentifier() == Param->getIdentifier()) {
if (const auto *SizeExprND = dyn_cast<NamedDecl>(DRE->getDecl()); SizeExprND && SizeExprND->getIdentifier() == Param->getIdentifier()) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++, we probably don't want to warn on int foo[::a], int a, so we should look at nested name specifiers

Comment on lines +10369 to +10372
QualType QT = Param->getType();
while (!isa<DecayedType>(QT.getTypePtr()) &&
(QT->isPointerType() || QT->isReferenceType()))
QT = QT->getPointeeType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't already a way to do that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c clang:bounds-safety Issue/PR relating to the experimental -fbounds-safety feature in Clang clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants