-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[StaticDataLayout][PGO] Add profile format for static data layout, and the classes to operate on the profiles. #138170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6cd7d8d
4727529
80249bc
b69c993
6fe9b48
df08094
6dd04e4
4b25d67
2ecc621
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
//===- DataAccessProf.h - Data access profile format support ---------*- C++ | ||
//-*-===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// This file contains support to construct and use data access profiles. | ||
// | ||
// For the original RFC of this pass please see | ||
// https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef LLVM_PROFILEDATA_DATAACCESSPROF_H_ | ||
#define LLVM_PROFILEDATA_DATAACCESSPROF_H_ | ||
|
||
#include "llvm/ADT/DenseMap.h" | ||
#include "llvm/ADT/DenseMapInfoVariant.h" | ||
#include "llvm/ADT/MapVector.h" | ||
#include "llvm/ADT/STLExtras.h" | ||
#include "llvm/ADT/SetVector.h" | ||
#include "llvm/ADT/SmallVector.h" | ||
#include "llvm/ADT/StringRef.h" | ||
#include "llvm/ProfileData/InstrProf.h" | ||
#include "llvm/Support/Allocator.h" | ||
#include "llvm/Support/Error.h" | ||
#include "llvm/Support/StringSaver.h" | ||
|
||
#include <cstdint> | ||
#include <optional> | ||
#include <variant> | ||
|
||
namespace llvm { | ||
|
||
namespace data_access_prof { | ||
|
||
/// The location of data in the source code. Used by profile lookup API. | ||
struct SourceLocation { | ||
SourceLocation(StringRef FileNameRef, uint32_t Line) | ||
: FileName(FileNameRef.str()), Line(Line) {} | ||
/// The filename where the data is located. | ||
std::string FileName; | ||
/// The line number in the source code. | ||
uint32_t Line; | ||
}; | ||
|
||
namespace internal { | ||
|
||
// Conceptually similar to SourceLocation except that FileNames are StringRef of | ||
// which strings are owned by `DataAccessProfData`. Used by `DataAccessProfData` | ||
// to represent data locations internally. | ||
struct SourceLocationRef { | ||
// The filename where the data is located. | ||
StringRef FileName; | ||
// The line number in the source code. | ||
uint32_t Line; | ||
}; | ||
|
||
// The data access profiles for a symbol. Used by `DataAccessProfData` | ||
// to represent records internally. | ||
struct DataAccessProfRecordRef { | ||
DataAccessProfRecordRef(uint64_t SymbolID, uint64_t AccessCount, | ||
bool IsStringLiteral) | ||
: SymbolID(SymbolID), AccessCount(AccessCount), | ||
IsStringLiteral(IsStringLiteral) {} | ||
|
||
// Represents a data symbol. The semantic comes in two forms: a symbol index | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When would the different forms be used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The semantic of this field depends on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understood that, but my question was more about why in practice some would be string literals and some would be hashes. Might be useful to note this in a comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This makes sense. Added comment at L55 to explain why two forms are used. |
||
// for symbol name if `IsStringLiteral` is false, or the hash of a string | ||
// content if `IsStringLiteral` is true. For most of the symbolizable static | ||
// data, the mangled symbol names remain stable relative to the source code | ||
// and therefore used to identify symbols across binary releases. String | ||
// literals have unstable name patterns like `.str.N[.llvm.hash]`, so we use | ||
// the content hash instead. This is a required field. | ||
uint64_t SymbolID; | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a little confusing that SymbolID is a different thing (a type) in the following class. Suggest making these different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed. I renamed the type |
||
|
||
// The access count of symbol. Required. | ||
uint64_t AccessCount; | ||
|
||
// True iff this is a record for string literal (symbols with name pattern | ||
// `.str.*` in the symbol table). Required. | ||
bool IsStringLiteral; | ||
|
||
// The locations of data in the source code. Optional. | ||
llvm::SmallVector<SourceLocationRef, 0> Locations; | ||
}; | ||
} // namespace internal | ||
|
||
// SymbolID is either a string representing symbol name if the symbol has | ||
// stable mangled name relative to source code, or a uint64_t representing the | ||
// content hash of a string literal (with unstable name patterns like | ||
// `.str.N[.llvm.hash]`). The StringRef is owned by the class's saver object. | ||
using SymbolHandleRef = std::variant<StringRef, uint64_t>; | ||
|
||
// The senamtic is the same as `SymbolHandleRef` above. The strings are owned. | ||
using SymbolHandle = std::variant<std::string, uint64_t>; | ||
|
||
/// The data access profiles for a symbol. | ||
struct DataAccessProfRecord { | ||
public: | ||
DataAccessProfRecord(SymbolHandleRef SymHandleRef, | ||
ArrayRef<internal::SourceLocationRef> LocRefs) { | ||
if (std::holds_alternative<StringRef>(SymHandleRef)) { | ||
SymHandle = std::get<StringRef>(SymHandleRef).str(); | ||
} else | ||
SymHandle = std::get<uint64_t>(SymHandleRef); | ||
|
||
for (auto Loc : LocRefs) | ||
Locations.push_back(SourceLocation(Loc.FileName, Loc.Line)); | ||
} | ||
SymbolHandle SymHandle; | ||
|
||
// The locations of data in the source code. Optional. | ||
SmallVector<SourceLocation> Locations; | ||
}; | ||
|
||
/// Encapsulates the data access profile data and the methods to operate on | ||
/// it. This class provides profile look-up, serialization and | ||
/// deserialization. | ||
class DataAccessProfData { | ||
public: | ||
// Use MapVector to keep input order of strings for serialization and | ||
// deserialization. | ||
using StringToIndexMap = llvm::MapVector<StringRef, uint64_t>; | ||
|
||
DataAccessProfData() : Saver(Allocator) {} | ||
|
||
/// Serialize profile data to the output stream. | ||
/// Storage layout: | ||
/// - Serialized strings. | ||
/// - The encoded hashes. | ||
/// - Records. | ||
Error serialize(ProfOStream &OS) const; | ||
|
||
/// Deserialize this class from the given buffer. | ||
Error deserialize(const unsigned char *&Ptr); | ||
|
||
/// Returns a profile record for \p SymbolID, or std::nullopt if there | ||
/// isn't a record. Internally, this function will canonicalize the symbol | ||
/// name before the lookup. | ||
std::optional<DataAccessProfRecord> | ||
getProfileRecord(const SymbolHandleRef SymID) const; | ||
|
||
/// Returns true if \p SymID is seen in profiled binaries and cold. | ||
bool isKnownColdSymbol(const SymbolHandleRef SymID) const; | ||
|
||
/// Methods to set symbolized data access profile. Returns error if | ||
/// duplicated symbol names or content hashes are seen. The user of this | ||
/// class should aggregate counters that correspond to the same symbol name | ||
/// or with the same string literal hash before calling 'set*' methods. | ||
Error setDataAccessProfile(SymbolHandleRef SymbolID, uint64_t AccessCount); | ||
/// Similar to the method above, for records with \p Locations representing | ||
/// the `filename:line` where this symbol shows up. Note because of linker's | ||
/// merge of identical symbols (e.g., unnamed_addr string literals), one | ||
/// symbol is likely to have multiple locations. | ||
Error setDataAccessProfile(SymbolHandleRef SymbolID, uint64_t AccessCount, | ||
ArrayRef<SourceLocation> Locations); | ||
/// Add a symbol that's seen in the profiled binary without samples. | ||
Error addKnownSymbolWithoutSamples(SymbolHandleRef SymbolID); | ||
|
||
/// The following methods return array reference for various internal data | ||
/// structures. | ||
ArrayRef<StringToIndexMap::value_type> getStrToIndexMapRef() const { | ||
return StrToIndexMap.getArrayRef(); | ||
} | ||
ArrayRef< | ||
MapVector<SymbolHandleRef, internal::DataAccessProfRecordRef>::value_type> | ||
getRecords() const { | ||
return Records.getArrayRef(); | ||
} | ||
ArrayRef<StringRef> getKnownColdSymbols() const { | ||
return KnownColdSymbols.getArrayRef(); | ||
} | ||
ArrayRef<uint64_t> getKnownColdHashes() const { | ||
return KnownColdHashes.getArrayRef(); | ||
} | ||
|
||
private: | ||
/// Serialize the symbol strings into the output stream. | ||
Error serializeSymbolsAndFilenames(ProfOStream &OS) const; | ||
|
||
/// Deserialize the symbol strings from \p Ptr and increment \p Ptr to the | ||
/// start of the next payload. | ||
Error deserializeSymbolsAndFilenames(const unsigned char *&Ptr, | ||
const uint64_t NumSampledSymbols, | ||
mingmingl-llvm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
const uint64_t NumColdKnownSymbols); | ||
|
||
/// Decode the records and increment \p Ptr to the start of the next | ||
/// payload. | ||
Error deserializeRecords(const unsigned char *&Ptr); | ||
|
||
/// A helper function to compute a storage index for \p SymbolID. | ||
uint64_t getEncodedIndex(const SymbolHandleRef SymbolID) const; | ||
|
||
// Keeps owned copies of the input strings. | ||
// NOTE: Keep `Saver` initialized before other class members that reference | ||
// its string copies and destructed after they are destructed. | ||
llvm::BumpPtrAllocator Allocator; | ||
llvm::UniqueStringSaver Saver; | ||
|
||
// `Records` stores the records. | ||
MapVector<SymbolHandleRef, internal::DataAccessProfRecordRef> Records; | ||
|
||
StringToIndexMap StrToIndexMap; | ||
llvm::SetVector<uint64_t> KnownColdHashes; | ||
llvm::SetVector<StringRef> KnownColdSymbols; | ||
}; | ||
|
||
} // namespace data_access_prof | ||
} // namespace llvm | ||
|
||
#endif // LLVM_PROFILEDATA_DATAACCESSPROF_H_ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
add_llvm_component_library(LLVMProfileData | ||
DataAccessProf.cpp | ||
GCOV.cpp | ||
IndexedMemProfData.cpp | ||
InstrProf.cpp | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the contents of the internal namespace (i.e. the ref variants) be moved to the .cpp file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use forward declaration under
internal
namespace; this gives compile errors which indicates forward decl doesn't work with class template instantiation.Something like this when moving
SourceLocationRef
itself, and movingDataAccessProfRecordRef
along withSourceLocationRef
caused similar static assertion errors forclass DataAccessProfData
.