-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) #127749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as spam.
This comment was marked as spam.
✅ With the latest revision this PR passed the Python code formatter. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
This will be used for supplying a `-target=<triple>` option to clang for DTLTO. Note that If DTLTO is changed to use an optimisation tool that does not require an explicit triple to be passed then the triple handling can be removed entirely.
The DTLTO ThinLTO backend will override this function to perform code generation. The DTLTO ThinLTO backend will not do any codegen when invoked for each task. Instead, it will generate the required information (e.g., the summary index shard, import list, etc.) to allow for the codegen to be performed externally. The backend's `wait` function will then invoke an external distributor process to do backend compilations.
The DTLTO ThinLTO backend will need to customize the file paths used for the summary index shard files. This is so that the names of these files can include a unique ID so that multiple DTLTO links do not overwrite each other's summary index shard files. It also needs to be able to get the import files list for each job in memory rather than writing it to a file. This is to allow the import lists to be included in the JSON used to communicate with the external distributor process.
The new setup() member is now called to allow the ThinLTO backends to prepare for code generation. For the DTLTO backend, this will be used to preallocate storage for the information required to perform the backend compilation jobs.
Move some setup code and state to a base class. This will allow the DTLTO ThinLTO backend to share this setup code and state by also deriving from this base class.
Structural changes: 1. A new ThinLTO backend implementing DTLTO has been added. 2. Both the new backend and the InProcess backend derive from a common base class to share common setup code and state. 3. The target triple is now stored for the ThinLTO bitcode files. 4. A new setup() member is called, which the ThinLTO backends can use to prepare for code generation. For the DTLTO backend, this is used to pre-allocate storage for the information required to perform the backend compilation jobs. 5. The functions for emitting summary index shard and imports files have been altered to allow the caller to specify the filenames to write and to allow the list of imports to be stored in a container rather than written to a file.
Intentionally minimal for now. Additional state translation will be added in future commits.
Replace the use of AddStream in the DTLTO ThinLTO backend to add files to the link with AddBuffer. Unlike the InProcess ThinLTO backend, DTLTO runs the backend compilation jobs by invoking an external process (currently clang). This writes the output object file to disk. Therefore, DTLTO requires a performant way of adding an existing file to the link. Note that the AddBuffer mechanism is also used for adding a file to the link if there is a cache hit.
Some of the code in LTO.h did not conform to the LLVM coding standard. However, it did match the style already used in that file. However this was causing automated code-formatting checks from Github to fail which was confusing on the PR. I decided to apply clang-format everywhere to prevent this - even though the new code no longer matches the style of the existing.
354e645
to
33dbf55
Compare
I forced pushed an update, the code is equivalent, reasons for the force-push:
|
1. Use "with" when opening files. 2. Add module docstrings.
In offline review it was suggested that the distributor scripts could benefit from module docstrings and the preferred "with" statement idiom for opening files. I made those changes and also spotted some nits in the tests which I have corrected. I'm adding new commits rather than updating the original commits and force-pushing as I think that force-pushing might cause turbulence for reviewers. Please let me know if you would prefer force-pushing. |
- Remove linker version. - Make AddBuffer a default parameter for LTO::run() to minimize changes at call sites. - Format code with clang-format, even if it deviates slightly from the surrounding style. I decided that CI passing for the PRs was more important than matching the existing style. - Order distributor command-line options correctly. - Remove dead code in llvm-lto2.cpp. - Add docstrings to Python distributors. - Update the Python mock.py distributor to validate JSON. - Organize LLVM DTLTO tests by moving them into a dedicated "dtlto" directory, tidying existing tests and adding new ones. - Implement fix for more than one LTO partition (spotted by MaskRay).
- Replace `thinlto-remote-opt-tool` with `thinlto-remote-compiler` - Make `thinlto-remote-compiler` a cl::opt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! I have some comments and questions below. Thanks for splitting out the LLVM portions.
cross-project-tests/dtlto/README.md
Outdated
@@ -0,0 +1,3 @@ | |||
Tests for DTLTO (integrated distributed ThinLTO) functionality. | |||
|
|||
These are integration tests as DTLTO invokes `clang` for code-generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the use of llvm-lto and opt here just to do testing before the clang and lld patches go in? If so, do we need cross project testing right now? Wondering if it would be better to just stick with the testing you have in this patch that is under llvm/test, and add the cross project testing using only external tools once those patches are in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test checks that the DTLTO code that creates Clang command lines generates the expected options and that those options are accepted by clang. I wrote it to use llvm-lto and opt so it could be part of this commit before the clang and lld patches go in. However, I think it is a nice principle to only use external tools in cross-project-tests. I have removed the test from this PR with the intention of adding it back later. Additionally if we can use your idea (#127749 (comment)) of storing the Clang -c
step options and applying them to the remote backend compilations then this test will be rewritten to test that behavior instead.
llvm/docs/DTLTO.rst
Outdated
contribute. Distributors can use this to label build jobs for informational | ||
purposes. | ||
|
||
- **Linker's version string**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I saw that this got removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies I hadn't pulled in the documentation changes that were done on: #126654. Fixed now.
llvm/docs/DTLTO.rst
Outdated
the same information that is emitted into import files for ThinLTO. | ||
|
||
- The **input files** required for each job. | ||
- The per-job set of files required for backend compilation, such as bitcode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the list of bitcode files include the bitcode files being imported from? If so, is the imports list described earlier needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON format has been simplified now. The imports field has been removed and there is now just an "inputs" field which contains all the inputs specific to a compilation job including the bitcode modules from which imports will be made. I have simplified the documentation to match. Thanks.
; RUN: rm -rf %t && split-file %s %t && cd %t | ||
|
||
;; Generate bitcode files with a summary index. | ||
; RUN: opt -thinlto-bc x86_64-unknown-linux-gnu.ll -o x86_64-unknown-linux-gnu.bc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use clang with input cc files instead of opt with input ll files for the cross project tests (related to my earlier comment about whether this should wait and just use external tools).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang can be used here. Thanks. However, I have removed this test for now. See earlier comment: #127749 (comment)
llvm/docs/DTLTO.rst
Outdated
invocation is contributing to, to aid the user in identifying them: | ||
|
||
- **JSON Job Description File**: | ||
- Format: `dtlto.<UID>.dist-file.json` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is UID in this context? Should this be PID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed this to be PID
now. Note that I expect the naming strategy to change in the future as we get feedback from more distribution systems. I have some concerns that mixing the PID
into the filenames might prevent the caching systems for some distribution systems from working effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have simplified the documentation now and this section has been removed. I don't want users relying on the naming scheme for temporary files which might be subject to change when I look at optimizing cache utilization.
RUN: not %{command} -save-temps 2>&1 \ | ||
RUN: | FileCheck %s --check-prefixes=ERR | ||
RUN: ls | FileCheck %s --check-prefix=NOINDEXFILES | ||
NOINDEXFILES-NOT: imports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant with earlier line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Removed the first check leaving only the -save-temps one.
# Check that imports files are created with -thinlto-emit-imports. | ||
RUN: not %{command} -thinlto-emit-imports 2>&1 \ | ||
RUN: | FileCheck %s --check-prefixes=ERR | ||
RUN: ls | FileCheck %s --check-prefix=INDEXFILES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above suggest s/INDEX/IMPORT/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
llvm/tools/llvm-lto2/llvm-lto2.cpp
Outdated
@@ -97,6 +97,12 @@ static cl::opt<bool> | |||
"specified with -thinlto-emit-indexes or " | |||
"-thinlto-distributed-indexes")); | |||
|
|||
static cl::opt<bool> DTLTO("dtlto", cl::desc("Perform DTLTO")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than requiring a separate bool flag, consider checking whether dtlto-distributor was specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Done.
llvm/utils/dtlto/mock.py
Outdated
|
||
if __name__ == "__main__": | ||
json_arg = sys.argv[-1] | ||
distributor_args = sys.argv[1:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest calling this input_file instead of distributor_args to be clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done now.
// shard, import list, etc..) to allow for the codegen to be performed | ||
// externally . This backend's `wait` function then invokes an external | ||
// distributor process to do backend compilations. | ||
class OutOfProcessThinBackend : public CGThinBackend { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, how much does this share with the original InProcessThinBackend? I am wondering if the approach of having them derive from the same refactored CGThinBackend is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very little. Just some state and a few lines which handle CfiFunctionDefs. It's still nice to share these parts though. Should I remove CGThinBackend? What is your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this the backend thread handling is quite similar between this new backend and the write indexes backend (which invokes emitFiles) - i.e. the latter could probably be refactored to use the new Job structure you have introduced here. It isn't clear to me which existing backend has more naturally in common with the new one. Do you eventually plan to share the Cache from the in process backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this the backend thread handling is quite similar between this new backend and the write indexes backend
In the sense of using the Cache and also adding the generated object files to the link, it is similar to the in process backend. However, in the sense of primarily using emitFiles to generate the pre-job summary index files, it is similar to the write indexes backend.
Do you eventually plan to share the Cache from the in process backend?
Eventually we will add support for the Cache that is used in the in process backend. We have found that, in some cases, this is more performant than the distribution system caching.
- Pull in documentation improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't read the bulk of the implementation yet. But start with some comments.
llvm/docs/DTLTO.rst
Outdated
Limitations | ||
----------- | ||
|
||
The current implementation of DTLTO has the following limitations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be a TODO list since the items on the list are all doable, thus not limitation by design.
llvm/docs/DTLTO.rst
Outdated
"primary_input": ["dtlto.o"], | ||
"summary_index": ["dtlto.1.51232.native.o.thinlto.bc"], | ||
"primary_output": ["dtlto.1.51232.native.o"], | ||
"imports": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is helpful to show example with more than one job so there is an example for how cross reference works.
llvm/docs/DTLTO.rst
Outdated
"-O3", "-fprofile-sample-use=my.profdata", | ||
"-o", ["primary_output", 0], | ||
"-c", "-x", "ir", ["primary_input", 0], | ||
["summary_index", "-fthinlto-index=", 0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a standard JSON format currently used by a distributor you targeting? If you have openAPI doc, it will be even better.
I feel the format is a bit weird. It is an array containing different types, and for the sub-array that provide indexes, they are not same length and the item at the same offset are not always the same thing. Doesn't it make a decoder harder to write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a standard JSON format currently used by a distributor you targeting? If you have openAPI doc, it will be even better.
It's my own format, although it has been shown to have enough information to drive distribution via real-world distribution systems we have tested with Incredibuild and reclient+buildfarm (both of which are available to test for free).
I'll try creating an OpenAPI doc. My feeling is that the format is too simple to need something like this, though.
I feel the format is a bit weird. It is an array containing different types, and for the sub-array that provide indexes, they are not same length and the item at the same offset are not always the same thing. Doesn't it make a decoder harder to write?
I don't think that decoding is particularly difficult (see, for example: local.py.)
I'm not attached to the format of the JSON as long as it can do what we need. Here are the goals/constraints I came up with for a JSON interface:
- Be as minimal as possible.
- Don't use spaces in field names.
- This allows for programmatic access, such as with member access syntax. e.g., .summary_index.
- Allow the codegen tool to be replaced without a schema update.
- Don't depend on platform-or-shell-specific shlexing of strings
- Think about handling quotes, spaces, etc..
- Support filenames appearing inside arguments
- e.g., -fthinlto-index=summary.index.
- Decouple LLVM from distributors to the largest degree possible.
- This is to try to ensure that if the command line for the codegen tool changes, the distributor(s) shouldn't need updating.
- Explicitly provide the primary input file, summary index files, and primary output file.
- This is to help with supporting distribution systems such as FASTBuild, distcc, and Icecream, which lack sandboxing and might require the paths to be manipulated.
In the current format, the sub-arrays are used to represent references to per-job filenames so that a command line can be stored that doesn't contain filenames. This ensures that the contents of the command line remain transparent to the distributor, allowing another code-generation tool to replace Clang without updating the distributors. It also means that the distributor has access to the filenames (as they are stored separately) and can manipulate them, if required, without needing to change the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try creating an OpenAPI doc. My feeling is that the format is too simple to need something like this, though.
My comments about OpenAPI is that if it already exists because the format exists, it will be good to be linked. Otherwise, I don't think it is a high priority task for now.
For the format, I don't have too much strings attached as well but here is what I think it can be improved:
- for the mix types, you have to parse by traversing json values and check on types. You can't really write a data type in languages like c++ that can directly map to the json object, thus why I say the parsing is a bit hard.
- Maybe it helps with a better example, I don't see what the last
0
in the array means. Does it ever be non-zero? Looking atlocal.py
only makes me more confused. An example with more than 1 input will definitely help here. - It is also unclear to me how to append different flags (for example, different targets as shown in the code review) to different jobs. Can target option be in the job section? Does distributor treat all elements in the list in the jobs to be inputs that it needs to fix the path?
If you ask me to design, I might do something like:
common
section has argument list that is completely strings with no input files. All input files and job specific flags can be appended later, assuming order never matters and we don't rely on overwriting behavior.job
section will contain a specific list of input files and output files, where distributor can fix their path and handle the file movement. It will also come with a list ofadditionalOption
objects that tell distributor how to construct the remaining of the command-line. The files references should just be indexed into the input/output list.
Also if the equal flags like -fthinlto-index=
is just a special case, we can always create an alias for that, if that doesn't worth the effort to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments about OpenAPI is that if it already exists because the format exists, it will be good to be linked. Otherwise, I don't think it is a high priority task for now.
Thanks for clarifying!
- for the mix types, you have to parse by traversing json values and check on types. You can't really write a data type in languages like c++ that can directly map to the json object, thus why I say the parsing is a bit hard.
I understand this point, but I feel that expanding the references in C++ code is not taxing. I also don't think that any referencing scheme would lend itself to mapping directly to C++ data types.
- Maybe it helps with a better example, I don't see what the last
0
in the array means. Does it ever be non-zero? Looking atlocal.py
only makes me more confused. An example with more than 1 input will definitely help here.
It's just a flexible referencing scheme to try to minimise the likelihood of ever needing to change it.
The expansion rule is specified here: https://github.com/bd1976bris/llvm-project/blob/DTLTO_llvm_only/llvm/docs/DTLTO.rst#command-line-expansion-example.
For a reference like: ["additional_inputs", "--", 2, "=", 0, ":", 1] and a corresponding per job array like:
{
...
"additional_inputs": ["first", "second", "third"]
}
Expansion yields: "--third=first:second"
I don't have a real-world use-case where this is important though. For the cases I have currently we only need to index into the first entry of the per-job arrays, and to handle the special case of equal flags like -fthinlto-index=
. I just came up with something that should be able to handle any conceivable option.
I will improve the documentation with a case that uses a non-zero index. That is, assuming we don't change the way references work (see the other parts of this comment).
- It is also unclear to me how to append different flags (for example, different targets as shown in the code review) to different jobs. Can target option be in the job section? Does distributor treat all elements in the list in the jobs to be inputs that it needs to fix the path?
This isn't currently possible. If it was needed my plan to support this was just to add an "args" array to the per-job entries that would be handled in the same manner as the current "args" array in the "common" object.
If you ask me to design, I might do something like:
common
section has argument list that is completely strings with no input files. All input files and job specific flags can be appended later, assuming order never matters and we don't rely on overwriting behavior.job
section will contain a specific list of input files and output files, where distributor can fix their path and handle the file movement. It will also come with a list ofadditionalOption
objects that tell distributor how to construct the remaining of the command-line. The files references should just be indexed into the input/output list.Also if the equal flags like
-fthinlto-index=
is just a special case, we can always create an alias for that, if that doesn't worth the effort to support.
Looks great 👍 I will rework along these lines. However, before drawing too much of your attention to the JSON format, can I first ask whether you (and other reviewers) are happy with a JSON interface? There are potentially some advantages to a plugin interface. I made a comment on this here: #126654 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cachemeifyoucan, I have explored several different JSON formats with different ways of referencing between filenames and the command line arguments. However, the ideas I tested made the JSON less understandable and the referencing less flexible than the initial JSON format discussed above.
I am now proposing a simpler format that eliminates references for filenames. Instead, it simply lists the input/output filenames and command-line arguments in arrays of strings. While this results in some string duplication, it only increases the JSON file size by 6% (compared to the initial scheme) in my tests on a Clang link. As well as simplicity and readability, an additional benefit is that consumers have less work to do when composing command lines.
With this format, modifying filenames programmatically would be more challenging for a consumer (distributor). We initially theorized this might be necessary for supporting certain distribution systems, but in my testing with SN-DBS, Incredibuild, and Reclient (see the description of #126654), it has not been required. I have no firm evidence that it will be a requirement in the future.
I have updated the implementation and documentation to use this simpler format. @teresajohnson and @MaskRay, I’d appreciate your input on this change. Of course, anyone is welcome to comment, but I’ve directly referenced those who have previously expressed opinions on the JSON format in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I don't have a strong opinion and I am not intended to block the review with JSON format design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cachemeifyoucan. Your comments on the JSON format were valuable and the new format makes this feature simpler and more approachable. Adding different options to different jobs is now possible as well.
llvm/include/llvm/LTO/LTO.h
Outdated
@@ -426,6 +460,7 @@ class LTO { | |||
// The bitcode modules to compile, if specified by the LTO Config. | |||
std::optional<ModuleMapType> ModulesToCompile; | |||
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID; | |||
DenseMap<StringRef, std::string> ModuleTriples; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was really hoping we have only one triple for the build because what does it even mean when you cross importing different triple. Will be good to document that you need to have per module triple to account for the versions field.
llvm/include/llvm/LTO/LTO.h
Outdated
@@ -426,6 +460,7 @@ class LTO { | |||
// The bitcode modules to compile, if specified by the LTO Config. | |||
std::optional<ModuleMapType> ModulesToCompile; | |||
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID; | |||
DenseMap<StringRef, std::string> ModuleTriples; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we usually don't use DenseMap<StringRef>
Maybe StringMap
is better.
llvm/lib/LTO/LTO.cpp
Outdated
"thinlto-distributor-arg", | ||
cl::desc("Additional arguments to pass to the ThinLTO distributor")); | ||
|
||
cl::opt<std::string> ThinLTORemoteCompiler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those cl::opt
debugging option or the real option that user needs to pass in? I should avoid -mllvm
option in production usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those
cl::opt
debugging option or the real option that user needs to pass in? I should avoid-mllvm
option in production usage.
The use of -mllvm options avoids the need to implement a similar set of options for DTLTO in each LLD port (and also in llvm-lto2). See: #126654 (comment). I don't have a strong opinion though. Perhaps @MaskRay would like to comment on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ultimately the goal for options in production usage should be lld options. I'm not sure whether it is more churn to implement there from the get-go vs starting with cl::opt and transitioning later. At the least there should be a TODO here to convert these options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to remove the cl::opt options and instead pass extra state when configuring the new backend. I will implement that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that -mllvm options are considered internal and for production usage we should recommend proper driver and linker options. It's annoying though to update llvm-lto2 and every ELF port....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using -mllvm options was a nice reduction in the number of changes required in different tools, which was helpful to reduce the size of the initial PR. However, I think there is agreement now that proper options are required. Note that we don't have to update all the LLD ports at once. We can add DTLTO support to the different LLD ports incrementally. Perhaps we can start with just the ELF LLD port (chosen as it is the LLD port I'm most familiar with)?
|
||
#--- t1.ll | ||
|
||
target triple = "x86_64-unknown-linux-gnu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer all the tests written with .ll
suffix, and using bitcode comment style, rather than use .test
suffix. It will get the correct syntax highlight so it is easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer all the tests written with
.ll
suffix, and using bitcode comment style, rather than use.test
suffix. It will get the correct syntax highlight so it is easier to read.
Sounds good. Although, I suppose that an alternative POV is that it's not correct because these files are not .ll
files. I don't mind doing this if you feel strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a .ll
file because every line of code that is not RUN/CHECK/comment is a llvm IR assembly. Even if we split file, we still name the combined file with the same suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally the approach looks fine to me.
llvm/include/llvm/LTO/LTO.h
Outdated
@@ -426,6 +460,7 @@ class LTO { | |||
// The bitcode modules to compile, if specified by the LTO Config. | |||
std::optional<ModuleMapType> ModulesToCompile; | |||
DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID; | |||
DenseMap<StringRef, std::string> ModuleTriples; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single triple from linker sounds good to me.
llvm/lib/LTO/LTO.cpp
Outdated
// distributor program. | ||
// TODO: If this strategy of deriving options proves insufficient, alternative | ||
// approaches should be considered, such as: | ||
// - A serialization/deserialization format for LTO configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of storing command-line options, especially you want some bitcode compatibility (-cc1 flag has no compatibility). That means the object files and future static library support will require rev-locking, which might or might not be want you want.
I think most of the target options should be available in the bitcode as attributes, and those are not, we should add more of them into bitcode. Converting some options out from CodeGen option here is unfortunate but I can accept that. Hacking other options into LTO code generation is always just a hack and was never officially supported.
inconvertibleErrorCode()); | ||
} | ||
|
||
for (auto &Job : Jobs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe using the thread pool again here to read file in parallel?
Also why specifically pass AddBuffer
function all the way here since AddStream
pretty much does the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe using the thread pool again here to read file in parallel?
My preference (but please feel free to disagree) would be to keep the code simple for the initial commit and then add optimizations later. It will be worthwhile eventually to try to load and delete these temporary files in as optimal a manner as possible. The optimizations I have included in this PR are those that require some sort of structural change, which brings us neatly onto...
Also why specifically pass AddBuffer function all the way here since AddStream pretty much does the same thing?
AddBuffer
should be a more performant way to add a file from disk (versus AddStream
). However, it's currently wrapped in the optional cache handling machinery and not exposed. Passing it through causes some turbulence, so I wanted to present this to reviewers to get comments. This hasn't worried any of the reviewers particularly - so I will revert to just using AddStream
for the initial commit. The AddBuffer
changes can be introduced later with performance analysis to justify them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done now.
- Rename *.test LIT tests to *.ll, so that syntax highlighting works. - Remove `dtlto` argument from llvm-lto2 and instead check whether `dtlto-distributor` was specified. - Rename `distributor_args` to `input_file` in mock.py to be clearer.
The Triple is selected within LTO rather than being selected in LLD and passed into the DTLO backend. I think this is a good choice because, due to the changes for multiple symbol tables for ARM64X for COFF LLD.. - llvm@b068f2f. ..it's not quite as trivial for LLD to choose a triple in some cases now. Note that it's easily doable - I just think it's simpler to have the code in LLVM's LTO.cpp on balance. There are also some other minor improvements in this commit: - Improved parameter/variable names for handling the reserved task range to make the logic easier to understand. - Used correct comment style in *.ll tests. - Use touch to create dummy files (makes it more obvious that the content is unimportant).
This simplifies the implementation. AddBuffer can be reintroduced later if performance metrics justify modifying the LTO interface.
return Error::success(); | ||
} | ||
|
||
namespace { | ||
class InProcessThinBackend : public ThinBackendProc { | ||
class CGThinBackend : public ThinBackendProc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base class needs a comment
llvm/test/ThinLTO/X86/dtlto/json.ll
Outdated
CHECK-NEXT: 0 | ||
CHECK-NEXT: ] | ||
CHECK: "--target=x86_64-unknown-linux-gnu" | ||
CHECK: "-O2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like "arguments" (https://clang.llvm.org/docs/JSONCompilationDatabase.html). Every argument on a separate line? This might be too wasteful...
The "command" form, while it has some shell-escape complexity, is probably better in both compactness and debuggability.
I sometimes extract a command from CMake-generated compile_commands.json
(which uses "command") and invoke it in the shell. The "arguments" form would be less convenient to invoke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like "arguments" (https://clang.llvm.org/docs/JSONCompilationDatabase.html). Every argument on a separate line? This might be too wasteful...
This test is checking pretty-printed JSON. The JSON generated for the distributors is more compact.
The "command" form, while it has some shell-escape complexity, is probably better in both compactness and debuggability.
I sometimes extract a command from CMake-generated compile_commands.json (which uses "command") and invoke it in the shell. The "arguments" form would be less convenient to invoke.
I understand the point about the advantages of the "command" form. However, to aid consumers of the JSON I strongly want to avoid the complexities of quoting for different potential shells on various platforms. I could add a utility script to consume the JSON and print the shell quoted commands to make it easy to run the individual commands?
As mentioned elsewhere on this thread, we have an alternative "no backend" design and implementation. Please find it here: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend Here's a summary:
If there are aspects that people prefer about this approach, we can consider bringing them in to this PR. Or if the "no backend" design is a preferred starting point, we can turn it into a Pull Request and take across parts of this one. |
The documentation has been updated for the new format and I have addressed any outstanding documentation comments.
Users should not rely on the temporary file naming scheme which may be subject to change.
Thanks for the review comments so far. This PR has now been updated to use a simpler JSON format. I have had to rework the documentation for that simpler format, and in the process I have addressed all the documentation suggestions that were made either here or on the design PR. The documentation and indeed the implementation is much simplified, and I believe all outstanding review comments are now addressed. Please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me. The LTO part might need @teresajohnson 's approval.
I'm OOO but will wrap up the review asap probably early next week. |
@teresajohnson , @MaskRay , @cachemeifyoucan , Could you please look at the alternative implementation of DTLTO (with no DTLTO backend and with plugins) and compare it with the DTLTO implementation described in this PR (with out-of-process DTLTO backend and external processes/python scripts) before approving this PR? Here is the link to this branch: It's a fully working DTLTO implementation, although it has been minimized to make it suitable for a code review. We spent considerable time working on this project, but we haven't received any comments/feedback about it. The link to this branch was posted as a comment in this PR, but might have been lost between hundreds of other comments. Katya Romanova (@romanova-ekaterina) already described the main ideas behind the alternative implementation and the difference between both implementations in one of the comments in the PR, but I will re-iterate it again: (1) Very little footprint on lld and llvm sources. |
This approach does simplify the lld and LTO side, but I'd like to get thoughts from @MaskRay, as this changes the interfacing with lld. The downside is that it then also requires support for a larger new DTLTO library (it looks like the original patch only contains a Local plugin - do each of the other distributed build systems need to replicate a lot of that functionality?). |
I just wanted to add my comment here. I have been working with @bd1976bris to try out these changes with FastBuild for our internal build infrastructure and I am happy to report that I have a PoC working based on this and I was recently able to get distribution to work. I found the interface easy to work with and was able to get my PoC to work pretty quickly. I am happy for this PR to move forward at this point. We probably need some changes in order to be more cache friendly, but I think that can come later. |
Hi Teresa,
No, replication of the functionality will not be needed for plugins in general. Plugins do not need to be linked with DTLTO static library. |
One thing might worth mentioning about two different solutions. However, we could have a hybrid (mix and match) solution: e.g. "no DTLTO backend" than invokes external process. |
While working with Tobias, I back-ported this PR to LLVM 19 and wired up the Clang and LLD side changes. In case it helps anyone else, that can be found here: https://github.com/bd1976bris/llvm-project/commits/dtlto_llvm19/. |
I don't have a strong opinion for either implementation. If I have to pick one, I like this PR better. I feel like the alternative will require keeping more implementation details of the build system in tree of LLVM. That is not a problem for itself but I don't know how stable those APIs and implementation details are, so it might be a problem for integration. On the other side, I see this JSON interface from this PR is very simple and stable (maybe we should add a version in there just to be safe). I don't think the LTO interface is bloated after the change and it is not stable anyway. Thus slightly prefer this patch. |
Actually, neither of the solutions will keep implementation details of the distribution build systems in the tree of LLVM. The plugins or external processes will either be in a separate GitHub repository or will be placed in a repository for particular distribution build systems (e.g. the DTLTO-related plugin for distcc will be added to the distcc repository). The location will be discussed with upstream and decided. Also, it's worth noting that both interfaces could be supported at the same time in DTLTO: external processes interface for those who prefer simplicity or plugin interface for those who need performance. To understand the performance advantage that the plugins interface could provide, please refer to items (5) and (6) in the comment that kbelochapka provided: Or we could have a hybrid (mix and match) solution: e.g. "no DTLTO backend" (from alternative solution) that invokes external process (from main solution) or vice versa. The main decision we are looking for reviewers' input on is whether they prefer having the DTLTO code in a ThinLTO backend (main solution) or not in a backend (alternative solution). To understand the main differences, please refer to items (1), (2), and (3) in the comment that kbelochapka provided: |
I would like to summarize the state of this PR as I understand it. Firstly, some reviewers have indicated that this PR is acceptable in its current state. However, there is an alternative proposal: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend. We somehow need to come to a conclusion, and I think an appeal to authority (fallacy or not) is the way :) @MaskRay – can I please ask you to reply to @teresajohnson's comment: #127749 (comment). @tru has created a distributor that works with his organization's fork of FASTBuild, using the JSON interface. If you prefer the no-backend approach, then the JSON interface from this PR could be merged into that approach. Thanks. |
ping. |
This PR introduces initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM.
DTLTO enables the distribution of backend ThinLTO compilations via external distribution systems, such as Incredibuild. Existing support for distributing ThinLTO compilations typically involves separate thin-link (
--thinlto-index-only
), backend compilation, and link steps coordinated by a modern build system, like Bazel. This "Bazel-style" distributed ThinLTO requires a modern build system as it must handle the dynamic dependencies specified in the summary index file shards. However, adopting a modern build system can be prohibitive for users with established build infrastructure. In contrast, DTLTO manages distribution within LLVM during the traditional link step. This approach means that DTLTO is usable with any build process that supports in-process ThinLTO.This PR provides a minimal but functional implementation of DTLTO, which will be expanded in subsequent commits. It implements basic support for distributing backend ThinLTO compilation jobs via an external process that coordinates with a distribution system (such as Incredibuild). A JSON interface is used for communication with this external process. The JSON interface allows new distribution systems to be supported without modifying LLVM. Please see the documentation added in this PR:
llvm/docs/dtlto.rst
for more details.Subsequent commits will add:
RFC: Integrated Distributed ThinLTO RFC
For the design of the DTLTO feature, see: #126654.