[Attributor] Questions on the precision of the Attributor's call graph

Given the following C++ input file:

```cpp
#include <iostream>

struct A { virtual void f() = 0; };
struct A1 : public A { virtual void f() { std::cout << "A1::f()\n"; } };
struct A2 : public A { virtual void f() { std::cout << "A2::f()\n"; } };
struct A3 : public A { virtual void f() { std::cout << "A3::f()\n"; } };

struct B { virtual void f() = 0; };
struct B1 : public B { virtual void f() { std::cout << "B1::f()\n"; } };
struct B2 : public B { virtual void f() { std::cout << "B2::f()\n"; } };
struct B3 : public B { virtual void f() { std::cout << "B3::f()\n"; } };

struct C { virtual void f(int) = 0; };
struct C1 : public C { virtual void f(int) { std::cout << "C1::f(int)\n"; } };

void Af(A *a) { a->f(); }
void Bf(B *b) { b->f(); }
void Cf(C *c) { c->f(0); }

int main() {

    Af(new A1());
    Af(new A2());

    Bf(new B1());
    Bf(new B2());

    Cf(new C1());

    return 0;
}
```

And using the Attributor to generate a callgraph:
```bash
clang++ example.cpp -c -o outputs/example.ll -flto -fvisibility=hidden -fwhole-program-vtables -fsanitize=cfi-icall -O0 -Xclang -disable-O0-optnone
opt outputs/example.ll -passes=attributor --attributor-assume-closed-world --attributor-print-call-graph -disable-output | c++filt | awk 'BEGIN{FS=OFS="\042"} { for (i=2; i<=NF; i+=2) { gsub(/</, "\\\&lt;", $i); gsub(/>/, "\\\&gt;", $i); }} 1' >outputs/example.callgraph.dot
dot outputs/example.callgraph.dot -Tsvg -o outputs/example.callgraph.svg
```

I get the following output:
![example1](https://github.com/llvm/llvm-project/assets/10748726/a72c9965-2c19-4a65-a183-2ba817e7572a)

i.e. `Af(A*)` is assumed to call (a.o.) `A1::f()`, `A2::f()`, `B1::f()`, `B2::f()`, and `C1::f(int)`.

This can be made more precise in (at least) two ways:

1. Calling a function through a pointer of an incompatible type is UB in at least C and C++.
C standard 6.3.2.3 paragraph 8:
> A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.

C++ standard 7.6.1.3 paragraph 5
> Calling a function through an expression whose function type E is different from the function type F of the called function’s definition results in undefined behavior unless the type “pointer to F” can be converted to the type “pointer to E” via a function pointer conversion (7.3.14).

Hence, `C1::f(int)` can be removed from the list of possible callees.

2. Using the vtable information encoded in the [Type Metadata](https://llvm.org/docs/TypeMetadata.html) (e.g., emitted through the use of `-flto -fvisibility=hidden -fwhole-program-vtables -fsanitize=cfi-icall`) to restrict the set to `A1::f()` and `A2::f()` only.

It seems that WPD uses some of this information: e.g., the call `c->f(0)` in `Cf` is converted to a direct call to `C1::f(int)`, as that is the only possible callee, but the Attributor does not.

From what I can tell from https://github.com/llvm/llvm-project/blob/db3bc494875626c6b8e7392f08c631489b056702/llvm/lib/Transforms/IPO/Attributor.cpp#L1072-L1079 and https://github.com/llvm/llvm-project/blob/db3bc494875626c6b8e7392f08c631489b056702/llvm/lib/Transforms/IPO/AttributorAttributes.cpp#L12227-L12232, the Attributor (in closed world module mode) assumes that any function of which the address is taken is a potential callee, which is overly conservative.
Are there any plans/interest for contributions to make use of the function signature and Type Metadata in the Attributor call graph as well?

	for (Function *Fn : Functions)
	if (Fn->hasAddressTaken(/PutOffender=/nullptr,
	/IgnoreCallbackUses=/false,
	/IgnoreAssumeLikeCalls=/true,
	/IgnoreLLVMUsed=/true,
	/IgnoreARCAttachedCall=/false,
	/IgnoreCastedDirectCall=/true))
	InfoCache.IndirectlyCallableFunctions.push_back(Fn);

	} else if (A.isClosedWorldModule()) {
	ArrayRef<Function *> IndirectlyCallableFunctions =
	A.getInfoCache().getIndirectlyCallableFunctions(A);
	PotentialCallees.insert(IndirectlyCallableFunctions.begin(),
	IndirectlyCallableFunctions.end());
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Attributor] Questions on the precision of the Attributor's call graph #74740

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

[Attributor] Questions on the precision of the Attributor's call graph #74740

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions