Skip to content

[llvm-debuginfo-analyzer] Add support for LLVM IR format. #135440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: users/SLTozer/debug-ssa-updater
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 165 additions & 4 deletions llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ SYNOPSIS
DESCRIPTION
-----------
:program:`llvm-debuginfo-analyzer` parses debug and text sections in
binary object files and prints their contents in a logical view, which
is a human readable representation that closely matches the structure
of the original user source code. Supported object file formats include
ELF, Mach-O, WebAssembly, PDB and COFF.
binary object files and textual IR representations and prints their
contents in a logical view, which is a human readable representation
that closely matches the structure of the original user source code.
Supported object file formats include ELF, Mach-O, WebAssembly, PDB,
COFF, IR (textual representation and bitcode).

The **logical view** abstracts the complexity associated with the
different low-level representations of the debugging information that
Expand Down Expand Up @@ -2124,6 +2125,138 @@ layout and given the number of matches.
-----------------------------
Total 71 8

IR (Textual representation and bitcode) SUPPORT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The below example is used to show the IR output generated by
:program:`llvm-debuginfo-analyzer`. We compiled the example for a
IR 64-bit target with Clang (-O0 -g --target=x86_64-linux):

.. code-block:: c++

1 using INTPTR = const int *;
2 int foo(INTPTR ParamPtr, unsigned ParamUnsigned, bool ParamBool) {
3 if (ParamBool) {
4 typedef int INTEGER;
5 const INTEGER CONSTANT = 7;
6 return CONSTANT;
7 }
8 return ParamUnsigned;
9 }

PRINT BASIC DETAILS
^^^^^^^^^^^^^^^^^^^
The following command prints basic details for all the logical elements
sorted by the debug information internal offset; it includes its lexical
level and debug info format.

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level,format
--output-sort=offset
--print=scopes,symbols,types,lines,instructions
test-clang.ll

or

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level,format
--output-sort=offset
--print=elements
test-clang.ll

Each row represents an element that is present within the debug
information. The first column represents the scope level, followed by
the associated line number (if any), and finally the description of
the element.

.. code-block:: none

Logical View:
[000] {File} 'test-clang.ll' -> Textual IR

[001] {CompileUnit} 'test.cpp'
[002] 2 {Function} extern not_inlined 'foo' -> 'int'
[003] {Block}
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
[004] 5 {Line}
[004] {Code} 'store i32 7, ptr %CONSTANT, align 4, !dbg !32'
[004] 6 {Line}
[004] {Code} 'store i32 7, ptr %retval, align 4, !dbg !33'
[004] 6 {Line}
[004] {Code} 'br label %return, !dbg !33'
[003] 2 {Parameter} 'ParamPtr' -> 'INTPTR'
[003] 2 {Parameter} 'ParamUnsigned' -> 'unsigned int'
[003] 2 {Parameter} 'ParamBool' -> 'bool'
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
[003] 2 {Line}
[003] {Code} '%retval = alloca i32, align 4'
[003] {Code} '%ParamPtr.addr = alloca ptr, align 8'
[003] {Code} '%ParamUnsigned.addr = alloca i32, align 4'
[003] {Code} '%ParamBool.addr = alloca i8, align 1'
[003] {Code} '%CONSTANT = alloca i32, align 4'
[003] {Code} 'store ptr %ParamPtr, ptr %ParamPtr.addr, align 8'
[003] {Code} 'store i32 %ParamUnsigned, ptr %ParamUnsigned.addr, align 4'
[003] {Code} '%storedv = zext i1 %ParamBool to i8'
[003] {Code} 'store i8 %storedv, ptr %ParamBool.addr, align 1'
[003] 8 {Line}
[003] {Code} '%1 = load i32, ptr %ParamUnsigned.addr, align 4, !dbg !34'
[003] 8 {Line}
[003] {Code} 'store i32 %1, ptr %retval, align 4, !dbg !35'
[003] 8 {Line}
[003] {Code} 'br label %return, !dbg !35'
[003] 9 {Line}
[003] {Code} '%2 = load i32, ptr %retval, align 4, !dbg !36'
[003] 9 {Line}
[003] {Code} 'ret i32 %2, !dbg !36'
[003] 3 {Line}
[003] 3 {Line}
[003] 3 {Line}
[003] {Code} 'br i1 %loadedv, label %if.then, label %if.end, !dbg !26'
[002] 1 {TypeAlias} 'INTPTR' -> '* const int'

SELECT LOGICAL ELEMENTS
^^^^^^^^^^^^^^^^^^^^^^^
The following prints all *instructions*, *symbols* and *types* that
contain **'block'** or **'.store'** in their names or types, using a tab
layout and given the number of matches.

.. code-block:: none
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a more illustrative example to motivate the reader -- is it possible to search for "INTPTR" and discover both the type alias and the parameter? That's the sort of query I think I'd end up making: "I can see this type in the output format, but why is it present? -> Ah, it's a parameter to a function".

(The current example is fine, I'm just trying to imagine a better one).


llvm-debuginfo-analyzer --attribute=level
--select-nocase --select-regex
--select=LOAD --select=store
--report=list
--print=symbols,types,instructions,summary
test-clang.ll

Logical View:
[000] {File} 'test-clang.ll'

[001] {CompileUnit} 'test.cpp'
[003] {Code} '%0 = load i8, ptr %ParamBool.addr, align 1, !dbg !26'
[003] {Code} '%1 = load i32, ptr %ParamUnsigned.addr, align 4, !dbg !34'
[003] {Code} '%2 = load i32, ptr %retval, align 4, !dbg !36'
[004] {Code} '%loadedv = trunc i8 %0 to i1, !dbg !26'
[003] {Code} '%storedv = zext i1 %ParamBool to i8'
[003] {Code} 'br i1 %loadedv, label %if.then, label %if.end, !dbg !26'
[003] {Code} 'store i32 %1, ptr %retval, align 4, !dbg !35'
[003] {Code} 'store i32 %ParamUnsigned, ptr %ParamUnsigned.addr, align 4'
[004] {Code} 'store i32 7, ptr %CONSTANT, align 4, !dbg !32'
[004] {Code} 'store i32 7, ptr %retval, align 4, !dbg !33'
[003] {Code} 'store i8 %storedv, ptr %ParamBool.addr, align 1'
[003] {Code} 'store ptr %ParamPtr, ptr %ParamPtr.addr, align 8'

-----------------------------
Element Total Printed
-----------------------------
Scopes 5 0
Symbols 4 0
Types 2 0
Lines 22 12
-----------------------------
Total 33 12

COMPARISON MODE
^^^^^^^^^^^^^^^
Given the previous example we found the above debug information issue
Expand Down Expand Up @@ -2197,6 +2330,34 @@ giving more context by swapping the reference and target object files.
The output shows the merging view path (reference and target) with the
missing and added elements.

.. code-block:: none

llvm-debuginfo-analyzer --attribute=level,format
--compare=types
--report=view
--print=symbols,types
test-clang.bc test-dwarf-gcc.o

Reference: 'test-clang.bc'
Target: 'test-dwarf-gcc.o'

Logical View:
[000] {File} 'test-clang.bc' -> Bitcode IR

[001] {CompileUnit} 'test.cpp'
[002] 1 {TypeAlias} 'INTPTR' -> '* const int'
[002] 2 {Function} extern not_inlined 'foo' -> 'int'
[003] {Block}
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
+[004] 4 {TypeAlias} 'INTEGER' -> 'int'
[003] 2 {Parameter} 'ParamBool' -> 'bool'
[003] 2 {Parameter} 'ParamPtr' -> 'INTPTR'
[003] 2 {Parameter} 'ParamUnsigned' -> 'unsigned int'
-[003] 4 {TypeAlias} 'INTEGER' -> 'int'

The same output but this time comparing the Clang bitcode with the
binary object (DWARF) generated by GCC.

LOGICAL ELEMENTS
""""""""""""""""
It compares individual logical elements without considering if their
Expand Down
26 changes: 25 additions & 1 deletion llvm/include/llvm/DebugInfo/LogicalView/Core/LVReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class LVSplitContext final {

/// The logical reader owns of all the logical elements created during
/// the debug information parsing. For its creation it uses a specific
/// bump allocator for each type of logical element.
/// bump allocator for each type of logical element.
class LVReader {
LVBinaryType BinaryType;

Expand Down Expand Up @@ -121,7 +121,24 @@ class LVReader {

#undef LV_OBJECT_ALLOCATOR

// Scopes with ranges for current compile unit. It is used to find a line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for the benefit of any other reviewers, these have been hoisted out of the object-file and DWARF readers to be more generic)

// giving its exact or closest address. To support comdat functions, all
// addresses for the same section are recorded in the same map.
using LVSectionRanges = std::map<LVSectionIndex, std::unique_ptr<LVRange>>;
LVSectionRanges SectionRanges;

protected:
// Current elements during the processing of a DIE/MDNode.
LVElement *CurrentElement = nullptr;
LVScope *CurrentScope = nullptr;
LVSymbol *CurrentSymbol = nullptr;
LVType *CurrentType = nullptr;
LVLine *CurrentLine = nullptr;
LVOffset CurrentOffset = 0;

// Address ranges collected for current DIE/MDNode/AST Node.
std::vector<LVAddressRange> CurrentRanges;

LVScopeRoot *Root = nullptr;
std::string InputFilename;
std::string FileFormatName;
Expand All @@ -132,11 +149,18 @@ class LVReader {
// Only for ELF format. The CodeView is handled in a different way.
LVSectionIndex DotTextSectionIndex = UndefinedSectionIndex;

void addSectionRange(LVSectionIndex SectionIndex, LVScope *Scope);
void addSectionRange(LVSectionIndex SectionIndex, LVScope *Scope,
LVAddress LowerAddress, LVAddress UpperAddress);
LVRange *getSectionRanges(LVSectionIndex SectionIndex);

// Record Compilation Unit entry.
void addCompileUnitOffset(LVOffset Offset, LVScopeCompileUnit *CompileUnit) {
CompileUnits.emplace(Offset, CompileUnit);
}

LVElement *createElement(dwarf::Tag Tag);

// Create the Scope Root.
virtual Error createScopes() {
Root = createScopeRoot();
Expand Down
13 changes: 13 additions & 0 deletions llvm/include/llvm/DebugInfo/LogicalView/Core/LVSupport.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,19 @@ template <typename T> class LVProperties {
#define KIND_3(ENUM, FIELD, F1, F2, F3) \
BOOL_BIT_3(Kinds, ENUM, FIELD, F1, F2, F3)

const int DEC_WIDTH = 8;
inline FormattedNumber decValue(uint64_t N, unsigned Width = DEC_WIDTH) {
return format_decimal(N, Width);
}

// Output the decimal representation of 'Value'.
inline std::string decString(uint64_t Value, size_t Width = DEC_WIDTH) {
std::string String;
raw_string_ostream Stream(String);
Stream << decValue(Value, Width);
return Stream.str();
}

const int HEX_WIDTH = 12;
inline FormattedNumber hexValue(uint64_t N, unsigned Width = HEX_WIDTH,
bool Upper = false) {
Expand Down
12 changes: 9 additions & 3 deletions llvm/include/llvm/DebugInfo/LogicalView/LVReaderHandler.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "llvm/DebugInfo/LogicalView/Core/LVReader.h"
#include "llvm/DebugInfo/PDB/Native/PDBFile.h"
#include "llvm/Object/Archive.h"
#include "llvm/Object/IRObjectFile.h"
#include "llvm/Object/MachOUniversal.h"
#include "llvm/Object/ObjectFile.h"
#include "llvm/Support/MemoryBuffer.h"
Expand All @@ -29,7 +30,9 @@ namespace logicalview {

using LVReaders = std::vector<std::unique_ptr<LVReader>>;
using ArgVector = std::vector<std::string>;
using PdbOrObj = PointerUnion<object::ObjectFile *, pdb::PDBFile *>;
using PdbOrObjOrIr =
PointerUnion<object::ObjectFile *, pdb::PDBFile *, object::IRObjectFile *,
MemoryBufferRef *, StringRef *>;
Comment on lines +33 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should be able to invent a more symbolic name for this type -- something like "InputHandle" perhaps? That communicates more about the purpose of the type than just a list of types it might be.


// This class performs the following tasks:
// - Creates a logical reader for every binary file in the command line,
Expand Down Expand Up @@ -60,9 +63,12 @@ class LVReaderHandler {
object::Binary &Binary);
Error handleObject(LVReaders &Readers, StringRef Filename, StringRef Buffer,
StringRef ExePath);
Error handleObject(LVReaders &Readers, StringRef Filename,
MemoryBufferRef Buffer);

Error createReader(StringRef Filename, LVReaders &Readers, PdbOrObj &Input,
StringRef FileFormatName, StringRef ExePath = {});
Error createReader(StringRef Filename, LVReaders &Readers,
PdbOrObjOrIr &Input, StringRef FileFormatName,
StringRef ExePath = {});

public:
LVReaderHandler() = delete;
Expand Down
12 changes: 1 addition & 11 deletions llvm/include/llvm/DebugInfo/LogicalView/Readers/LVBinaryReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/Object/COFF.h"
#include "llvm/Object/IRObjectFile.h"
#include "llvm/Object/ObjectFile.h"

namespace llvm {
Expand Down Expand Up @@ -93,12 +94,6 @@ class LVBinaryReader : public LVReader {
SectionAddresses.emplace(Section.getAddress(), Section);
}

// Scopes with ranges for current compile unit. It is used to find a line
// giving its exact or closest address. To support comdat functions, all
// addresses for the same section are recorded in the same map.
using LVSectionRanges = std::map<LVSectionIndex, std::unique_ptr<LVRange>>;
LVSectionRanges SectionRanges;

// Image base and virtual address for Executable file.
uint64_t ImageBaseAddress = 0;
uint64_t VirtualAddress = 0;
Expand Down Expand Up @@ -179,11 +174,6 @@ class LVBinaryReader : public LVReader {
Expected<std::pair<LVSectionIndex, object::SectionRef>>
getSection(LVScope *Scope, LVAddress Address, LVSectionIndex SectionIndex);

void addSectionRange(LVSectionIndex SectionIndex, LVScope *Scope);
void addSectionRange(LVSectionIndex SectionIndex, LVScope *Scope,
LVAddress LowerAddress, LVAddress UpperAddress);
LVRange *getSectionRanges(LVSectionIndex SectionIndex);

void includeInlineeLines(LVSectionIndex SectionIndex, LVScope *Function);

Error createInstructions();
Expand Down
10 changes: 0 additions & 10 deletions llvm/include/llvm/DebugInfo/LogicalView/Readers/LVDWARFReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,22 +39,13 @@ class LVDWARFReader final : public LVBinaryReader {
LVAddress CUBaseAddress = 0;
LVAddress CUHighAddress = 0;

// Current elements during the processing of a DIE.
LVElement *CurrentElement = nullptr;
LVScope *CurrentScope = nullptr;
LVSymbol *CurrentSymbol = nullptr;
LVType *CurrentType = nullptr;
LVOffset CurrentOffset = 0;
LVOffset CurrentEndOffset = 0;

// In DWARF v4, the files are 1-indexed.
// In DWARF v5, the files are 0-indexed.
// The DWARF reader expects the indexes as 1-indexed.
bool IncrementFileIndex = false;

// Address ranges collected for current DIE.
std::vector<LVAddressRange> CurrentRanges;

// Symbols with locations for current compile unit.
LVSymbols SymbolsWithLocations;

Expand Down Expand Up @@ -82,7 +73,6 @@ class LVDWARFReader final : public LVBinaryReader {

void mapRangeAddress(const object::ObjectFile &Obj) override;

LVElement *createElement(dwarf::Tag Tag);
void traverseDieAndChildren(DWARFDie &DIE, LVScope *Parent,
DWARFDie &SkeletonDie);
// Process the attributes for the given DIE.
Expand Down
Loading
Loading