Skip to content

False-positive: fopen() without fclose() in main() reported as a "resource leak" clang:static analyzer #133942

Open
@alavrentiev

Description

@alavrentiev
int main(int argc, char** argv)
{
    int   port;
    FILE* fp;

    /* Cmd.-line args */
    if (argc != 4  ||
        (fp = fopen(argv[1], "r")) == 0  ||
        ((port = atoi(argv[3])) & ~0xFFFF)) {
        fputs("\nUSAGE:\n"
              "test_fw <inp_file> <host> <port>\n\n", stderr);
        return 1;  // <== Opened stream never closed. Potential resource leak
    }
   ...
   /* some work is done here */
   ...
}

Return from main() with an open file does not pose any resource leaks, as the file gets closed by C RTL right away.
It if was any other function, then yes, in such a situation the report would be justified and true, but not for main().

Activity

changed the title [-]False-positive: fopen() without fclose() in main() reported as a "resource leak"[/-] [+]False-positive: fopen() without fclose() in main() reported as a "resource leak" clang:static analyzer[/+] on Apr 1, 2025
llvmbot

llvmbot commented on Apr 1, 2025

@llvmbot
Member

@llvm/issue-subscribers-clang-static-analyzer

Author: None (alavrentiev)

``` int main(int argc, char** argv) { int port; FILE* fp;
/* Cmd.-line args */
if (argc != 4  ||
    (fp = fopen(argv[1], "r")) == 0  ||
    ((port = atoi(argv[3])) &amp; ~0xFFFF)) {
    fputs("\nUSAGE:\n"
          "test_fw &lt;inp_file&gt; &lt;host&gt; &lt;port&gt;\n\n", stderr);
    return 1;  // &lt;== Opened stream never closed. Potential resource leak
}

...
/* some work is done here */
...
}

Return from `main()` with an open file does not pose any resource leaks, as the file gets closed by C RTL right away.
It if was any other function, then yes, in such a situation the report would be justified and true, but *not* for `main()`.

</details>
chrchr-github

chrchr-github commented on Apr 2, 2025

@chrchr-github

What if main() leaked e.g. 1000 handles in a loop? Is there a reason not to use fclose()?

firewave

firewave commented on Apr 2, 2025

@firewave

If this would occur in code run with fork() or exec*(), wouldn't that leak still manifest itself?

alavrentiev

alavrentiev commented on Apr 2, 2025

@alavrentiev
Author

If this would occur in code run with fork() or exec*(), wouldn't that leak still manifest itself?

@firewave : the report is specifically about return from main(); not any other ways to leave it.

alavrentiev

alavrentiev commented on Apr 2, 2025

@alavrentiev
Author

What if main() leaked e.g. 1000 handles in a loop?

Without a FILE* variable reassigned or reinstated (loop, block, etc) there's no leak in main(), and that is easy to logically distinguish from any other situation / function.

So if main() has an array of 1000 FILE*s and opens them all, then does some work, and returns, there's no leak, either.

Is there a reason not to use fclose()?

Yes, there is. If main() has multiple (many!) exit points (returns) while there's a FILE* previously opened, there's no need to use fclose() at each and every one of them, because that's what the language offers to do.

firewave

firewave commented on Apr 2, 2025

@firewave

@firewave : the report is specifically about return from main(); not any other ways to leave it.

exec*() and spawn*() can be used to run another executable from a process. I am wondering if file descriptors or memory from those subprocesses might be inherited by the parent process and thus accumulate the leaks from subprocesses which do not properly clean up.

alavrentiev

alavrentiev commented on Apr 2, 2025

@alavrentiev
Author

I am wondering if file descriptors or memory from those subprocesses might be inherited by the parent process and thus accumulate the leaks from subprocesses which do not properly clean up.

@firewave : none of these calls create a "subprocess" (in the meaning that it's some sort of being a part of the original one). exec() completely replaces the entire process memory with a new image (file descriptors remain open, but NOT FILE*). fork() splits (copies) the process into two independent entities, each with its own memory. So whatever the copy does with its resources is not going to affect the original (parent) process, and vice versa.

But let us not digress from the issue at hand, which is: when main() returns then the leakage of FILE*s (or file descriptors) should be handled more gratuitously than in any other C function under the same circumstances.

firewave

firewave commented on Apr 2, 2025

@firewave

So whatever the copy does with its resources is not going to affect the original (parent) process, and vice versa.

Thanks for elaborating. I have only ever used it in the most basic and default way and I am not familiar with all the behavior implied by the various variants and flags.

I brought this up because I have seen issues where file descriptor exhaustion was trigger by underlying processes. It has been a while though so my memory is very faint.

steakhal

steakhal commented on Apr 2, 2025

@steakhal
Contributor

Usually, special-casing main leads to marginal benefits because it's a tiny narrow subset of the cases when we ever see main.
W.r.t. this bug, it appears to me that it's also fairly easy to spot this sort of "false-positive" so it shouldn't be a huge pain point.
This makes this for me a less important issue.

We could however check if we are directly in main before constructing the bug report and just skip it if that's the case. That should be fairly easy to implement.

steakhal

steakhal commented on Apr 2, 2025

@steakhal
Contributor

We could however check if we are directly in main before constructing the bug report and just skip it if that's the case. That should be fairly easy to implement.

@Flandini Do you wanna give it a try?

vakatov

vakatov commented on Apr 2, 2025

@vakatov

when main() returns then the leakage of FILE*s (or file descriptors) should be handled more gratuitously than in any other C function under the same circumstances.

Because the Static Analyzer (SA) may actually (in the future, if not already) analyze code pathways which run through different functions (and even different compilation units), then -- won't the suggested approach basically mean that SA must not track any FILE*'s (or bare file descriptors), period? Also, will it include sockets, etc?

alavrentiev

alavrentiev commented on Apr 2, 2025

@alavrentiev
Author

@firewave

where file descriptor exhaustion was trigger by underlying processes

JFTR, there may be a system-wide limit of simultaneously open files, which can affect any process, regardless of their hierarchal dependencies.

NagyDonat

NagyDonat commented on Apr 3, 2025

@NagyDonat
Contributor

when main() returns then the leakage of FILE*s (or file descriptors) should be handled more gratuitously than in any other C function under the same circumstances.

Because the Static Analyzer (SA) may actually (in the future, if not already) analyze code pathways which run through different functions (and even different compilation units), then -- won't the suggested approach basically mean that SA must not track any FILE*'s (or bare file descriptors), period? Also, will it include sockets, etc?

The analyzer does already analyze code pathways that run thorough different functions (essentially it "inlines" their definitions) and it has a --ctu mode that enables Cross-Translation Unit analysis which can inline functions from other translation units; but even in these cases there are limits on the analysis depth, so it can only access a minority of the code through main, so there are many other analysis runs that start from different functions.

vakatov

vakatov commented on Apr 3, 2025

@vakatov

My point is that all functions are ultimately called from main(); so I am not sure that I see real difference between fopen()'ing in main() vs anywhere else. E.g.:
int main() { ....; fopen(...); ....}
versus:
void foo() { ....; fopen(...); ....}
int main() { ....; foo(); ....}

SEt-t

SEt-t commented on Apr 4, 2025

@SEt-t

My point is that all functions are ultimately called from main(); so I am not sure that I see real difference between fopen()'ing in main() vs anywhere else.

I also think the same. This proposal effectively says that no fopen() should be tracked as eventual exit from main() would close the handles. fopen() in main() is not a special case in C.

alavrentiev

alavrentiev commented on Apr 4, 2025

@alavrentiev
Author

fopen() in main() is not a special case in C.

@SEt-t :

That's not true. In the above [current version] from @vakatov, all use cases are leaks (for there are no assignments). In the previous version, only cases 2 and 4 were leaks (because no other code pathways were trackable for them -- and code may attempt to open 1000's more of such descriptors, creating potentially bad consequences). And 1 and 3 were not -- we know they were taken care of (albeit implicitly).

A return from main() is equivalent to exit() -- so does code analyzer "complain" if there's an exit() call somewhere else in the code that has an active FILE* open? I don't think so. Because exit() (or abort(), for that matter) has well-defined no-return semantics, and so does a return in main(). main() is a special function in the language and should be treated as such.

SEt-t

SEt-t commented on Apr 4, 2025

@SEt-t

In the above [current version] from @vakatov, all use cases are leaks (for there are no assignments).

What you think the leak is is completely irrelevant to the language. Assigning the result of fopen() somewhere or not doesn't change that the handle isn't directly passed to releasing function. So in terms of resource leaks those cases are equivalent.

Let's look at some cases:

  1. Some function f() calls fopen() and returns that handle.
  2. main() calls f() and doesn't assign the result anywhere.
  3. main() calls f() and assigns the result to variable that is never used after that.
  4. main() calls f() and assigns the result to variable that is used to work with file but never explicitly fclose()'d.
  5. 6 7. Like 2-4 but f() is called from function g() instead of main(), main() consists of single line return g();.

Which cases have resource leak by your "definition"?

alavrentiev

alavrentiev commented on Apr 6, 2025

@alavrentiev
Author

@SEt-t :

In my definition (no quotes), a resource leak is an irrecoverable use of some limited supply, which can lead to irrecoverable exhaustion of such.

In your examples, the source code analyzer is perfectly capable to tracking that 2 is a leak, because f() returns a handle, which is lost, i.e. its resource is no longer recoverable (e.g. suppose that f() is called again before main() terminates -- that's the leaking accumulation). 3 and 4 are not leaks because the handles in main() keep referencing that very FILE*, and main() takes care of it by either doing an explicit fclose() or closing it implicitly when it terminates, in either case leading to a controlled recovery of the resource.

In 5, 6, and 7, I assume g() returns an int (to be compatible with what return from main() is expected to have), so they are all leaks because the handle had not been previously closed, and is now lost. Should such g() be called somewhere in a loop, for example, all those handles that f() produced could obviously exhaust the available FILE pointers or the underlying file descriptors, and the recovery is impossible.

If, however, g() returned the result of f() to be used in main() as return g() ? 0/*success*/ : 1/*failure*/;, then there's no leak, as again the handle is still trackable [albeit implicitly in this case], and main() is going to close it (upon executing that return statement). Now, if this very same return statement was used in any other function, however, then it would constitute a resource leak again [because at the return point in such a function, the handle would become actually lost]. Which is why I keep saying that main() is a special function, whose return should be regarded as equivalent to exit().

SEt-t

SEt-t commented on Apr 7, 2025

@SEt-t

I think I see where you get it wrong:

In your examples, the source code analyzer is perfectly capable to tracking that 2 is a leak, because f() returns a handle, which is lost, i.e. its resource is no longer recoverable (e.g. suppose that f() is called again before main() terminates -- that's the leaking accumulation). 3 and 4 are not leaks because the handles in main() keep referencing that very FILE*, and main() takes care of it by either doing an explicit fclose() or closing it implicitly when it terminates, in either case leading to a controlled recovery of the resource.

C handles are just opaque values with no special behavior (like cleanup). Absolutely nowhere in standard it's said that main() implicitly closes any handles. Only that return from main() implicitly calls exit() with argument of the returned value (not any magic handles!):

If the return type of the mainfunction is a type compatible with int, a return from the initial call to the mainfunction is equivalent to calling the exit function with the value returned by the main function as its argument; reaching the } that terminates the main function returns a value of 0.

And then exit() does the cleanup:

Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed.

So, 2 and 3 are indistinguishable in terms of resource cleanup. (As well as 4-7.)

steakhal

steakhal commented on Apr 7, 2025

@steakhal
Contributor

I wonder if the same argument would hold if main is a function with thousands of lines of code. Should we ignore leaks there too?

Or what if main recurses and calls itself. I believe thats valid in C. Then should we consider leaks?

SEt-t

SEt-t commented on Apr 7, 2025

@SEt-t

@steakhal
I strongly feel that auto-closing of FILE handles should not be relied upon and warning should be issued. What if FILE is opened in a loop of runtime-read N iterations? Small N can work just fine but not large ones. And yes, recursive main() is valid in C. So it's better to be consistent and not special-case main().

vakatov

vakatov commented on Apr 7, 2025

@vakatov

Probably there should be some kind of formal "golden rule" about the Static Analyzer’s expectations vis-à-vis resource management by the user code, something like:

All resources (heap memory, file descriptors, etc.) which are allocated as a result of the user code action and whose handles are in any way (directly or indirectly) returned to be owned by the user… must be deallocated (or scheduled/registered to be deallocated) by the user code.

steakhal

steakhal commented on Apr 9, 2025

@steakhal
Contributor

Probably there should be some kind of formal "golden rule" about the Static Analyzer’s expectations vis-à-vis resource management by the user code, something like:

All resources (heap memory, file descriptors, etc.) which are allocated as a result of the user code action and whose handles are in any way (directly or indirectly) returned to be owned by the user… must be deallocated (or scheduled/registered to be deallocated) by the user code.

That's pretty much the sentiment, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @firewave@steakhal@EugeneZelenko@vakatov@alavrentiev

        Issue actions

          False-positive: fopen() without fclose() in main() reported as a "resource leak" clang:static analyzer · Issue #133942 · llvm/llvm-project