[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) #127749

bd1976bris · 2025-02-19T05:31:05Z

This PR introduces initial support for Integrated Distributed ThinLTO (DTLTO) in LLVM.

DTLTO enables the distribution of backend ThinLTO compilations via external distribution systems, such as Incredibuild. Existing support for distributing ThinLTO compilations typically involves separate thin-link (--thinlto-index-only), backend compilation, and link steps coordinated by a modern build system, like Bazel. This "Bazel-style" distributed ThinLTO requires a modern build system as it must handle the dynamic dependencies specified in the summary index file shards. However, adopting a modern build system can be prohibitive for users with established build infrastructure. In contrast, DTLTO manages distribution within LLVM during the traditional link step. This approach means that DTLTO is usable with any build process that supports in-process ThinLTO.

This PR provides a minimal but functional implementation of DTLTO, which will be expanded in subsequent commits. It implements basic support for distributing backend ThinLTO compilation jobs via an external process that coordinates with a distribution system (such as Incredibuild). A JSON interface is used for communication with this external process. The JSON interface allows new distribution systems to be supported without modifying LLVM. Please see the documentation added in this PR: llvm/docs/dtlto.rst for more details.

Subsequent commits will add:

Support for the ThinLTO cache.
Support for more LTO configuration states e.g., basic block sections.
Performance improvements, for example, improving the performance of the temporary file removal.

RFC: Integrated Distributed ThinLTO RFC
For the design of the DTLTO feature, see: #126654.

github-actions · 2025-02-19T05:34:34Z

✅ With the latest revision this PR passed the Python code formatter.

github-actions · 2025-02-19T05:34:34Z

✅ With the latest revision this PR passed the C/C++ code formatter.

This will be used for supplying a `-target=<triple>` option to clang for DTLTO. Note that If DTLTO is changed to use an optimisation tool that does not require an explicit triple to be passed then the triple handling can be removed entirely.

The DTLTO ThinLTO backend will override this function to perform code generation. The DTLTO ThinLTO backend will not do any codegen when invoked for each task. Instead, it will generate the required information (e.g., the summary index shard, import list, etc.) to allow for the codegen to be performed externally. The backend's `wait` function will then invoke an external distributor process to do backend compilations.

The DTLTO ThinLTO backend will need to customize the file paths used for the summary index shard files. This is so that the names of these files can include a unique ID so that multiple DTLTO links do not overwrite each other's summary index shard files. It also needs to be able to get the import files list for each job in memory rather than writing it to a file. This is to allow the import lists to be included in the JSON used to communicate with the external distributor process.

The new setup() member is now called to allow the ThinLTO backends to prepare for code generation. For the DTLTO backend, this will be used to preallocate storage for the information required to perform the backend compilation jobs.

Move some setup code and state to a base class. This will allow the DTLTO ThinLTO backend to share this setup code and state by also deriving from this base class.

Structural changes: 1. A new ThinLTO backend implementing DTLTO has been added. 2. Both the new backend and the InProcess backend derive from a common base class to share common setup code and state. 3. The target triple is now stored for the ThinLTO bitcode files. 4. A new setup() member is called, which the ThinLTO backends can use to prepare for code generation. For the DTLTO backend, this is used to pre-allocate storage for the information required to perform the backend compilation jobs. 5. The functions for emitting summary index shard and imports files have been altered to allow the caller to specify the filenames to write and to allow the list of imports to be stored in a container rather than written to a file.

Intentionally minimal for now. Additional state translation will be added in future commits.

Replace the use of AddStream in the DTLTO ThinLTO backend to add files to the link with AddBuffer. Unlike the InProcess ThinLTO backend, DTLTO runs the backend compilation jobs by invoking an external process (currently clang). This writes the output object file to disk. Therefore, DTLTO requires a performant way of adding an existing file to the link. Note that the AddBuffer mechanism is also used for adding a file to the link if there is a cache hit.

Some of the code in LTO.h did not conform to the LLVM coding standard. However, it did match the style already used in that file. However this was causing automated code-formatting checks from Github to fail which was confusing on the PR. I decided to apply clang-format everywhere to prevent this - even though the new code no longer matches the style of the existing.

bd1976bris · 2025-02-19T14:37:29Z

I forced pushed an update, the code is equivalent, reasons for the force-push:

To split up the change into multiple smaller steps. This should facilitate review.
To apply clang-format to everything (I decided that getting the GitHub checks green was more important than maintaining the existing coding style).

1. Use "with" when opening files. 2. Add module docstrings.

bd1976bris · 2025-02-20T23:48:55Z

In offline review it was suggested that the distributor scripts could benefit from module docstrings and the preferred "with" statement idiom for opening files. I made those changes and also spotted some nits in the tests which I have corrected. I'm adding new commits rather than updating the original commits and force-pushing as I think that force-pushing might cause turbulence for reviewers. Please let me know if you would prefer force-pushing.

- Remove linker version. - Make AddBuffer a default parameter for LTO::run() to minimize changes at call sites. - Format code with clang-format, even if it deviates slightly from the surrounding style. I decided that CI passing for the PRs was more important than matching the existing style. - Order distributor command-line options correctly. - Remove dead code in llvm-lto2.cpp. - Add docstrings to Python distributors. - Update the Python mock.py distributor to validate JSON. - Organize LLVM DTLTO tests by moving them into a dedicated "dtlto" directory, tidying existing tests and adding new ones. - Implement fix for more than one LTO partition (spotted by MaskRay).

- Replace `thinlto-remote-opt-tool` with `thinlto-remote-compiler` - Make `thinlto-remote-compiler` a cl::opt

teresajohnson

Sorry for the delay! I have some comments and questions below. Thanks for splitting out the LLVM portions.

teresajohnson · 2025-02-28T20:52:21Z

cross-project-tests/dtlto/README.md

@@ -0,0 +1,3 @@
+Tests for DTLTO (integrated distributed ThinLTO) functionality.
+
+These are integration tests as DTLTO invokes `clang` for code-generation.


Is the use of llvm-lto and opt here just to do testing before the clang and lld patches go in? If so, do we need cross project testing right now? Wondering if it would be better to just stick with the testing you have in this patch that is under llvm/test, and add the cross project testing using only external tools once those patches are in.

This test checks that the DTLTO code that creates Clang command lines generates the expected options and that those options are accepted by clang. I wrote it to use llvm-lto and opt so it could be part of this commit before the clang and lld patches go in. However, I think it is a nice principle to only use external tools in cross-project-tests. I have removed the test from this PR with the intention of adding it back later. Additionally if we can use your idea (#127749 (comment)) of storing the Clang -c step options and applying them to the remote backend compilations then this test will be rewritten to test that behavior instead.

teresajohnson · 2025-02-28T21:00:01Z

llvm/docs/DTLTO.rst

+     contribute. Distributors can use this to label build jobs for informational
+     purposes.
+
+- **Linker's version string**.


I thought I saw that this got removed

Apologies I hadn't pulled in the documentation changes that were done on: #126654. Fixed now.

teresajohnson · 2025-02-28T21:01:15Z

llvm/docs/DTLTO.rst

+     the same information that is emitted into import files for ThinLTO.
+
+- The **input files** required for each job.
+   - The per-job set of files required for backend compilation, such as bitcode


Does the list of bitcode files include the bitcode files being imported from? If so, is the imports list described earlier needed?

The JSON format has been simplified now. The imports field has been removed and there is now just an "inputs" field which contains all the inputs specific to a compilation job including the bitcode modules from which imports will be made. I have simplified the documentation to match. Thanks.

teresajohnson · 2025-02-28T21:04:27Z

cross-project-tests/dtlto/dtlto-translate-options.ll

+; RUN: rm -rf %t && split-file %s %t && cd %t
+
+;; Generate bitcode files with a summary index.
+; RUN: opt -thinlto-bc x86_64-unknown-linux-gnu.ll -o x86_64-unknown-linux-gnu.bc


Why not use clang with input cc files instead of opt with input ll files for the cross project tests (related to my earlier comment about whether this should wait and just use external tools).

Clang can be used here. Thanks. However, I have removed this test for now. See earlier comment: #127749 (comment)

teresajohnson · 2025-02-28T21:06:39Z

llvm/docs/DTLTO.rst

+invocation is contributing to, to aid the user in identifying them:
+
+- **JSON Job Description File**:
+    - Format:  `dtlto.<UID>.dist-file.json`


What is UID in this context? Should this be PID?

I have changed this to be PID now. Note that I expect the naming strategy to change in the future as we get feedback from more distribution systems. I have some concerns that mixing the PID into the filenames might prevent the caching systems for some distribution systems from working effectively.

I have simplified the documentation now and this section has been removed. I don't want users relying on the naming scheme for temporary files which might be subject to change when I look at optimizing cache utilization.

teresajohnson · 2025-02-28T22:25:12Z

llvm/test/ThinLTO/X86/dtlto/imports.test

+RUN: not %{command} -save-temps 2>&1 \ 
+RUN:   | FileCheck %s --check-prefixes=ERR
+RUN: ls | FileCheck %s --check-prefix=NOINDEXFILES
+NOINDEXFILES-NOT: imports


Redundant with earlier line

Thanks. Removed the first check leaving only the -save-temps one.

teresajohnson · 2025-02-28T22:25:34Z

llvm/test/ThinLTO/X86/dtlto/imports.test

+# Check that imports files are created with -thinlto-emit-imports.
+RUN: not %{command} -thinlto-emit-imports 2>&1 \ 
+RUN:   | FileCheck %s --check-prefixes=ERR
+RUN: ls | FileCheck %s --check-prefix=INDEXFILES


Similar to above suggest s/INDEX/IMPORT/

Thanks. Fixed.

teresajohnson · 2025-02-28T22:28:14Z

llvm/tools/llvm-lto2/llvm-lto2.cpp

@@ -97,6 +97,12 @@ static cl::opt<bool>
                                "specified with -thinlto-emit-indexes or "
                                "-thinlto-distributed-indexes"));

+static cl::opt<bool> DTLTO("dtlto", cl::desc("Perform DTLTO"));


Rather than requiring a separate bool flag, consider checking whether dtlto-distributor was specified?

Nice. Done.

teresajohnson · 2025-02-28T22:30:22Z

llvm/utils/dtlto/mock.py

+
+if __name__ == "__main__":
+    json_arg = sys.argv[-1]
+    distributor_args = sys.argv[1:-1]


suggest calling this input_file instead of distributor_args to be clearer

Thanks. Done now.

teresajohnson · 2025-02-28T22:46:48Z

llvm/lib/LTO/LTO.cpp

+// shard, import list, etc..) to allow for the codegen to be performed
+// externally . This backend's `wait` function then invokes an external
+// distributor process to do backend compilations.
+class OutOfProcessThinBackend : public CGThinBackend {


In practice, how much does this share with the original InProcessThinBackend? I am wondering if the approach of having them derive from the same refactored CGThinBackend is useful.

Very little. Just some state and a few lines which handle CfiFunctionDefs. It's still nice to share these parts though. Should I remove CGThinBackend? What is your preference?

I was thinking about this the backend thread handling is quite similar between this new backend and the write indexes backend (which invokes emitFiles) - i.e. the latter could probably be refactored to use the new Job structure you have introduced here. It isn't clear to me which existing backend has more naturally in common with the new one. Do you eventually plan to share the Cache from the in process backend?

I was thinking about this the backend thread handling is quite similar between this new backend and the write indexes backend

In the sense of using the Cache and also adding the generated object files to the link, it is similar to the in process backend. However, in the sense of primarily using emitFiles to generate the pre-job summary index files, it is similar to the write indexes backend.

Do you eventually plan to share the Cache from the in process backend?

Eventually we will add support for the Cache that is used in the in process backend. We have found that, in some cases, this is more performant than the distribution system caching.

- Pull in documentation improvements.

cachemeifyoucan

I haven't read the bulk of the implementation yet. But start with some comments.

cachemeifyoucan · 2025-03-06T00:13:05Z

llvm/docs/DTLTO.rst

+Limitations
+-----------
+
+The current implementation of DTLTO has the following limitations:


I feel like this should be a TODO list since the items on the list are all doable, thus not limitation by design.

cachemeifyoucan · 2025-03-06T00:17:41Z

llvm/docs/DTLTO.rst

+                "primary_input": ["dtlto.o"],
+                "summary_index": ["dtlto.1.51232.native.o.thinlto.bc"],
+                "primary_output": ["dtlto.1.51232.native.o"],
+                "imports": [],


I think it is helpful to show example with more than one job so there is an example for how cross reference works.

cachemeifyoucan · 2025-03-06T00:22:56Z

llvm/docs/DTLTO.rst

+                "-O3", "-fprofile-sample-use=my.profdata",
+                "-o", ["primary_output", 0],
+                "-c", "-x", "ir", ["primary_input", 0],
+                ["summary_index", "-fthinlto-index=", 0],


Is this a standard JSON format currently used by a distributor you targeting? If you have openAPI doc, it will be even better.

I feel the format is a bit weird. It is an array containing different types, and for the sub-array that provide indexes, they are not same length and the item at the same offset are not always the same thing. Doesn't it make a decoder harder to write?

Is this a standard JSON format currently used by a distributor you targeting? If you have openAPI doc, it will be even better.

It's my own format, although it has been shown to have enough information to drive distribution via real-world distribution systems we have tested with Incredibuild and reclient+buildfarm (both of which are available to test for free).

I'll try creating an OpenAPI doc. My feeling is that the format is too simple to need something like this, though.

I feel the format is a bit weird. It is an array containing different types, and for the sub-array that provide indexes, they are not same length and the item at the same offset are not always the same thing. Doesn't it make a decoder harder to write?

I don't think that decoding is particularly difficult (see, for example: local.py.)

I'm not attached to the format of the JSON as long as it can do what we need. Here are the goals/constraints I came up with for a JSON interface:

Be as minimal as possible.

Don't use spaces in field names.

This allows for programmatic access, such as with member access syntax. e.g., .summary_index.

Allow the codegen tool to be replaced without a schema update.

Don't depend on platform-or-shell-specific shlexing of strings

Think about handling quotes, spaces, etc..

Support filenames appearing inside arguments

e.g., -fthinlto-index=summary.index.

Decouple LLVM from distributors to the largest degree possible.

This is to try to ensure that if the command line for the codegen tool changes, the distributor(s) shouldn't need updating.

Explicitly provide the primary input file, summary index files, and primary output file.

This is to help with supporting distribution systems such as FASTBuild, distcc, and Icecream, which lack sandboxing and might require the paths to be manipulated.

In the current format, the sub-arrays are used to represent references to per-job filenames so that a command line can be stored that doesn't contain filenames. This ensures that the contents of the command line remain transparent to the distributor, allowing another code-generation tool to replace Clang without updating the distributors. It also means that the distributor has access to the filenames (as they are stored separately) and can manipulate them, if required, without needing to change the command line.

I'll try creating an OpenAPI doc. My feeling is that the format is too simple to need something like this, though.

My comments about OpenAPI is that if it already exists because the format exists, it will be good to be linked. Otherwise, I don't think it is a high priority task for now.

For the format, I don't have too much strings attached as well but here is what I think it can be improved:

for the mix types, you have to parse by traversing json values and check on types. You can't really write a data type in languages like c++ that can directly map to the json object, thus why I say the parsing is a bit hard.

Maybe it helps with a better example, I don't see what the last 0 in the array means. Does it ever be non-zero? Looking at local.py only makes me more confused. An example with more than 1 input will definitely help here.

It is also unclear to me how to append different flags (for example, different targets as shown in the code review) to different jobs. Can target option be in the job section? Does distributor treat all elements in the list in the jobs to be inputs that it needs to fix the path?

If you ask me to design, I might do something like:

common section has argument list that is completely strings with no input files. All input files and job specific flags can be appended later, assuming order never matters and we don't rely on overwriting behavior.

job section will contain a specific list of input files and output files, where distributor can fix their path and handle the file movement. It will also come with a list of additionalOption objects that tell distributor how to construct the remaining of the command-line. The files references should just be indexed into the input/output list.

Also if the equal flags like -fthinlto-index= is just a special case, we can always create an alias for that, if that doesn't worth the effort to support.

My comments about OpenAPI is that if it already exists because the format exists, it will be good to be linked. Otherwise, I don't think it is a high priority task for now.

Thanks for clarifying!

for the mix types, you have to parse by traversing json values and check on types. You can't really write a data type in languages like c++ that can directly map to the json object, thus why I say the parsing is a bit hard.

I understand this point, but I feel that expanding the references in C++ code is not taxing. I also don't think that any referencing scheme would lend itself to mapping directly to C++ data types.

Maybe it helps with a better example, I don't see what the last 0 in the array means. Does it ever be non-zero? Looking at local.py only makes me more confused. An example with more than 1 input will definitely help here.

It's just a flexible referencing scheme to try to minimise the likelihood of ever needing to change it.

The expansion rule is specified here: https://github.com/bd1976bris/llvm-project/blob/DTLTO_llvm_only/llvm/docs/DTLTO.rst#command-line-expansion-example.

For a reference like: ["additional_inputs", "--", 2, "=", 0, ":", 1] and a corresponding per job array like:

{ ... "additional_inputs": ["first", "second", "third"] }

Expansion yields: "--third=first:second"

I don't have a real-world use-case where this is important though. For the cases I have currently we only need to index into the first entry of the per-job arrays, and to handle the special case of equal flags like -fthinlto-index=. I just came up with something that should be able to handle any conceivable option.

I will improve the documentation with a case that uses a non-zero index. That is, assuming we don't change the way references work (see the other parts of this comment).

It is also unclear to me how to append different flags (for example, different targets as shown in the code review) to different jobs. Can target option be in the job section? Does distributor treat all elements in the list in the jobs to be inputs that it needs to fix the path?

This isn't currently possible. If it was needed my plan to support this was just to add an "args" array to the per-job entries that would be handled in the same manner as the current "args" array in the "common" object.

If you ask me to design, I might do something like:

common section has argument list that is completely strings with no input files. All input files and job specific flags can be appended later, assuming order never matters and we don't rely on overwriting behavior.

job section will contain a specific list of input files and output files, where distributor can fix their path and handle the file movement. It will also come with a list of additionalOption objects that tell distributor how to construct the remaining of the command-line. The files references should just be indexed into the input/output list.

Also if the equal flags like -fthinlto-index= is just a special case, we can always create an alias for that, if that doesn't worth the effort to support.

Looks great 👍 I will rework along these lines. However, before drawing too much of your attention to the JSON format, can I first ask whether you (and other reviewers) are happy with a JSON interface? There are potentially some advantages to a plugin interface. I made a comment on this here: #126654 (comment).

@cachemeifyoucan, I have explored several different JSON formats with different ways of referencing between filenames and the command line arguments. However, the ideas I tested made the JSON less understandable and the referencing less flexible than the initial JSON format discussed above.

I am now proposing a simpler format that eliminates references for filenames. Instead, it simply lists the input/output filenames and command-line arguments in arrays of strings. While this results in some string duplication, it only increases the JSON file size by 6% (compared to the initial scheme) in my tests on a Clang link. As well as simplicity and readability, an additional benefit is that consumers have less work to do when composing command lines.
With this format, modifying filenames programmatically would be more challenging for a consumer (distributor). We initially theorized this might be necessary for supporting certain distribution systems, but in my testing with SN-DBS, Incredibuild, and Reclient (see the description of #126654), it has not been required. I have no firm evidence that it will be a requirement in the future.

I have updated the implementation and documentation to use this simpler format. @teresajohnson and @MaskRay, I’d appreciate your input on this change. Of course, anyone is welcome to comment, but I’ve directly referenced those who have previously expressed opinions on the JSON format in this PR.

LGTM. I don't have a strong opinion and I am not intended to block the review with JSON format design.

Thanks @cachemeifyoucan. Your comments on the JSON format were valuable and the new format makes this feature simpler and more approachable. Adding different options to different jobs is now possible as well.

llvm/docs/DTLTO.rst

cachemeifyoucan · 2025-03-06T00:28:33Z

llvm/include/llvm/LTO/LTO.h

@@ -426,6 +460,7 @@ class LTO {
    // The bitcode modules to compile, if specified by the LTO Config.
    std::optional<ModuleMapType> ModulesToCompile;
    DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
+    DenseMap<StringRef, std::string> ModuleTriples;


I was really hoping we have only one triple for the build because what does it even mean when you cross importing different triple. Will be good to document that you need to have per module triple to account for the versions field.

cachemeifyoucan · 2025-03-06T00:29:25Z

llvm/include/llvm/LTO/LTO.h

@@ -426,6 +460,7 @@ class LTO {
    // The bitcode modules to compile, if specified by the LTO Config.
    std::optional<ModuleMapType> ModulesToCompile;
    DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
+    DenseMap<StringRef, std::string> ModuleTriples;


Also we usually don't use DenseMap<StringRef> Maybe StringMap is better.

cachemeifyoucan · 2025-03-06T00:32:17Z

llvm/lib/LTO/LTO.cpp

+    "thinlto-distributor-arg",
+    cl::desc("Additional arguments to pass to the ThinLTO distributor"));
+
+cl::opt<std::string> ThinLTORemoteCompiler(


Are those cl::opt debugging option or the real option that user needs to pass in? I should avoid -mllvm option in production usage.

Are those cl::opt debugging option or the real option that user needs to pass in? I should avoid -mllvm option in production usage.

The use of -mllvm options avoids the need to implement a similar set of options for DTLTO in each LLD port (and also in llvm-lto2). See: #126654 (comment). I don't have a strong opinion though. Perhaps @MaskRay would like to comment on this.

I think ultimately the goal for options in production usage should be lld options. I'm not sure whether it is more churn to implement there from the get-go vs starting with cl::opt and transitioning later. At the least there should be a TODO here to convert these options

I'm happy to remove the cl::opt options and instead pass extra state when configuring the new backend. I will implement that in this PR.

I understand that -mllvm options are considered internal and for production usage we should recommend proper driver and linker options. It's annoying though to update llvm-lto2 and every ELF port....

Using -mllvm options was a nice reduction in the number of changes required in different tools, which was helpful to reduce the size of the initial PR. However, I think there is agreement now that proper options are required. Note that we don't have to update all the LLD ports at once. We can add DTLTO support to the different LLD ports incrementally. Perhaps we can start with just the ELF LLD port (chosen as it is the LLD port I'm most familiar with)?

cachemeifyoucan · 2025-03-06T00:33:14Z

llvm/test/ThinLTO/X86/dtlto/dtlto.test

+
+#--- t1.ll
+
+target triple = "x86_64-unknown-linux-gnu"


I prefer all the tests written with .ll suffix, and using bitcode comment style, rather than use .test suffix. It will get the correct syntax highlight so it is easier to read.

I prefer all the tests written with .ll suffix, and using bitcode comment style, rather than use .test suffix. It will get the correct syntax highlight so it is easier to read.

Sounds good. Although, I suppose that an alternative POV is that it's not correct because these files are not .ll files. I don't mind doing this if you feel strongly about it.

This is a .ll file because every line of code that is not RUN/CHECK/comment is a llvm IR assembly. Even if we split file, we still name the combined file with the same suffix.

Ok. I'll do this.

cachemeifyoucan

Generally the approach looks fine to me.

cachemeifyoucan · 2025-03-06T21:00:17Z

llvm/include/llvm/LTO/LTO.h

@@ -426,6 +460,7 @@ class LTO {
    // The bitcode modules to compile, if specified by the LTO Config.
    std::optional<ModuleMapType> ModulesToCompile;
    DenseMap<GlobalValue::GUID, StringRef> PrevailingModuleForGUID;
+    DenseMap<StringRef, std::string> ModuleTriples;


Single triple from linker sounds good to me.

cachemeifyoucan · 2025-03-06T21:37:54Z

llvm/lib/LTO/LTO.cpp

+  // distributor program.
+  // TODO: If this strategy of deriving options proves insufficient, alternative
+  // approaches should be considered, such as:
+  //   - A serialization/deserialization format for LTO configuration.


I am not a fan of storing command-line options, especially you want some bitcode compatibility (-cc1 flag has no compatibility). That means the object files and future static library support will require rev-locking, which might or might not be want you want.

I think most of the target options should be available in the bitcode as attributes, and those are not, we should add more of them into bitcode. Converting some options out from CodeGen option here is unfortunate but I can accept that. Hacking other options into LTO code generation is always just a hack and was never officially supported.

cachemeifyoucan · 2025-03-06T21:49:44Z

llvm/lib/LTO/LTO.cpp

+          inconvertibleErrorCode());
+    }
+
+    for (auto &Job : Jobs) {


Maybe using the thread pool again here to read file in parallel?

Also why specifically pass AddBuffer function all the way here since AddStream pretty much does the same thing?

Maybe using the thread pool again here to read file in parallel?

My preference (but please feel free to disagree) would be to keep the code simple for the initial commit and then add optimizations later. It will be worthwhile eventually to try to load and delete these temporary files in as optimal a manner as possible. The optimizations I have included in this PR are those that require some sort of structural change, which brings us neatly onto...

Also why specifically pass AddBuffer function all the way here since AddStream pretty much does the same thing?

AddBuffer should be a more performant way to add a file from disk (versus AddStream). However, it's currently wrapped in the optional cache handling machinery and not exposed. Passing it through causes some turbulence, so I wanted to present this to reviewers to get comments. This hasn't worried any of the reviewers particularly - so I will revert to just using AddStream for the initial commit. The AddBuffer changes can be introduced later with performance analysis to justify them.

Thanks. Done now.

- Rename *.test LIT tests to *.ll, so that syntax highlighting works. - Remove `dtlto` argument from llvm-lto2 and instead check whether `dtlto-distributor` was specified. - Rename `distributor_args` to `input_file` in mock.py to be clearer.

The Triple is selected within LTO rather than being selected in LLD and passed into the DTLO backend. I think this is a good choice because, due to the changes for multiple symbol tables for ARM64X for COFF LLD.. - llvm@b068f2f. ..it's not quite as trivial for LLD to choose a triple in some cases now. Note that it's easily doable - I just think it's simpler to have the code in LLVM's LTO.cpp on balance. There are also some other minor improvements in this commit: - Improved parameter/variable names for handling the reserved task range to make the logic easier to understand. - Used correct comment style in *.ll tests. - Use touch to create dummy files (makes it more obvious that the content is unimportant).

This simplifies the implementation. AddBuffer can be reintroduced later if performance metrics justify modifying the LTO interface.

MaskRay · 2025-03-16T04:25:41Z

llvm/lib/LTO/LTO.cpp

  return Error::success();
 }

 namespace {
-class InProcessThinBackend : public ThinBackendProc {
+class CGThinBackend : public ThinBackendProc {


The base class needs a comment

MaskRay · 2025-03-16T04:31:12Z

llvm/test/ThinLTO/X86/dtlto/json.ll

+CHECK-NEXT: 0
+CHECK-NEXT: ]
+CHECK:      "--target=x86_64-unknown-linux-gnu"
+CHECK:      "-O2",


This looks like "arguments" (https://clang.llvm.org/docs/JSONCompilationDatabase.html). Every argument on a separate line? This might be too wasteful...

The "command" form, while it has some shell-escape complexity, is probably better in both compactness and debuggability.
I sometimes extract a command from CMake-generated compile_commands.json (which uses "command") and invoke it in the shell. The "arguments" form would be less convenient to invoke.

This looks like "arguments" (https://clang.llvm.org/docs/JSONCompilationDatabase.html). Every argument on a separate line? This might be too wasteful...

This test is checking pretty-printed JSON. The JSON generated for the distributors is more compact.

The "command" form, while it has some shell-escape complexity, is probably better in both compactness and debuggability.
I sometimes extract a command from CMake-generated compile_commands.json (which uses "command") and invoke it in the shell. The "arguments" form would be less convenient to invoke.

I understand the point about the advantages of the "command" form. However, to aid consumers of the JSON I strongly want to avoid the complexities of quoting for different potential shells on various platforms. I could add a utility script to consume the JSON and print the shell quoted commands to make it easy to run the individual commands?

romanova-ekaterina · 2025-03-16T08:34:32Z

As mentioned elsewhere on this thread, we have an alternative "no backend" design and implementation.

Please find it here: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend

Here's a summary:

Out of process (DTLTO) backend is not used in this implementation.
Both regular and thin archive members are supported in a way that's minimally invasive to LTO. This is in contrast to the approach whereby a new backend is introduced, since archive support for DTLTO cannot be implemented within a backend. Support for (non-thin) archives would require changes to the linker or LTO.cpp.
Only COFF and ELF formats are supported in the initial commit. Support for all other formats is implemented, but it will be submitted later.
Footprint on lld and llvm/lto is very small.
Plugins (shared libraries) are used in this implementation rather than external processes/python scripts.
Plugins allow direct calls and two-way communication between LLVM and DBS, while external processes require parameter serialization and thus allow only one-way communication with DBS. Plugins allow to pass memory buffers to DBS client and get output back from DBS in memory (which saves 2 writes and 2 reads from the file). Note: We have a prototype implemented for distcc distribution system that allows direct memory to memory interface, which we can share on request. Direct memory to memory interface could not be implemented in one-way external process interface.
Throughput is one of the two main factors that contribute to distribution build system efficiency. Load balancing is the other factor. Plugin interface could be extended to integrate more closely with the DBS, allowing do more advanced things like getting information about internal state of DBS, allowing to do things like load balancing.

If there are aspects that people prefer about this approach, we can consider bringing them in to this PR. Or if the "no backend" design is a preferred starting point, we can turn it into a Pull Request and take across parts of this one.

The documentation has been updated for the new format and I have addressed any outstanding documentation comments.

Users should not rely on the temporary file naming scheme which may be subject to change.

bd1976bris · 2025-03-21T17:05:19Z

Thanks for the review comments so far. This PR has now been updated to use a simpler JSON format. I have had to rework the documentation for that simpler format, and in the process I have addressed all the documentation suggestions that were made either here or on the design PR. The documentation and indeed the implementation is much simplified, and I believe all outstanding review comments are now addressed. Please take another look.

… now been decided.

MaskRay

This looks reasonable to me. The LTO part might need @teresajohnson 's approval.

teresajohnson · 2025-04-08T14:35:45Z

This looks reasonable to me. The LTO part might need @teresajohnson 's approval.

I'm OOO but will wrap up the review asap probably early next week.

kbelochapka · 2025-04-09T00:49:24Z

This looks reasonable to me. The LTO part might need @teresajohnson 's approval.

@teresajohnson , @MaskRay , @cachemeifyoucan ,

Could you please look at the alternative implementation of DTLTO (with no DTLTO backend and with plugins) and compare it with the DTLTO implementation described in this PR (with out-of-process DTLTO backend and external processes/python scripts) before approving this PR?

Here is the link to this branch:
https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend

It's a fully working DTLTO implementation, although it has been minimized to make it suitable for a code review. We spent considerable time working on this project, but we haven't received any comments/feedback about it. The link to this branch was posted as a comment in this PR, but might have been lost between hundreds of other comments.

Katya Romanova (@romanova-ekaterina) already described the main ideas behind the alternative implementation and the difference between both implementations in one of the comments in the PR, but I will re-iterate it again:

(1) Very little footprint on lld and llvm sources.
(2) Easy to use, only one command line option to control –distribute=
(3) Both regular and thin archive members are fully supported. This contrasts with the DTLTO backend solution, since archive support for DTLTO cannot be implemented within a LTO backend.
(4) Only COFF and ELF formats are supported in the initial commit. Support for all other formats is implemented, but it will be submitted later.
(5) Plugins (shared libraries) are used in this implementation rather than external processes/python scripts. Plugin that supports external scripts was implemented but was not included in the initial review. Plugins can be statically linked into a lld executable. We have several plugins implemented that support most popular distributed build systems – DistCC, IceCream, FastBuild, Icredibuild, Re-Client, HomeCC
(6) Plugins allow direct calls and two-way communication between LLVM and DBS, while external processes require parameter serialization and allow only one-way communication. For example, one if the advantages of using plugins is the ability to pass memory buffers to DBS client and get output back from DBS in memory (which saves 2 writes and 2 reads from the file). Note: We have a plugin prototype implemented for DistCC distribution system which implements direct memory to memory interface with DistCC client. (we can share it on request). Direct memory to memory interface could not be implemented in one-way external process interface.
(7) Throughput is one of the two main factors that contribute to distribution build system efficiency. Load balancing is the other factor. The Plugin interface could be integrated more closely with a DBS, allowing do more advanced compilation jobs scheduling based on current distributed build system state, which in turn would improve the load balancing.

teresajohnson · 2025-04-09T14:18:25Z

This looks reasonable to me. The LTO part might need @teresajohnson 's approval.

@teresajohnson , @MaskRay , @cachemeifyoucan ,

Could you please look at the alternative implementation of DTLTO (with no DTLTO backend and with plugins) and compare it with the DTLTO implementation described in this PR (with out-of-process DTLTO backend and external processes/python scripts) before approving this PR?

Here is the link to this branch: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend

It's a fully working DTLTO implementation, although it has been minimized to make it suitable for a code review. We spent considerable time working on this project, but we haven't received any comments/feedback about it. The link to this branch was posted as a comment in this PR, but might have been lost between hundreds of other comments.

Katya Romanova (@romanova-ekaterina) already described the main ideas behind the alternative implementation and the difference between both implementations in one of the comments in the PR, but I will re-iterate it again:

(1) Very little footprint on lld and llvm sources. (2) Easy to use, only one command line option to control –distribute= (3) Both regular and thin archive members are fully supported. This contrasts with the DTLTO backend solution, since archive support for DTLTO cannot be implemented within a LTO backend. (4) Only COFF and ELF formats are supported in the initial commit. Support for all other formats is implemented, but it will be submitted later. (5) Plugins (shared libraries) are used in this implementation rather than external processes/python scripts. Plugin that supports external scripts was implemented but was not included in the initial review. Plugins can be statically linked into a lld executable. We have several plugins implemented that support most popular distributed build systems – DistCC, IceCream, FastBuild, Icredibuild, Re-Client, HomeCC (6) Plugins allow direct calls and two-way communication between LLVM and DBS, while external processes require parameter serialization and allow only one-way communication. For example, one if the advantages of using plugins is the ability to pass memory buffers to DBS client and get output back from DBS in memory (which saves 2 writes and 2 reads from the file). Note: We have a plugin prototype implemented for DistCC distribution system which implements direct memory to memory interface with DistCC client. (we can share it on request). Direct memory to memory interface could not be implemented in one-way external process interface. (7) Throughput is one of the two main factors that contribute to distribution build system efficiency. Load balancing is the other factor. The Plugin interface could be integrated more closely with a DBS, allowing do more advanced compilation jobs scheduling based on current distributed build system state, which in turn would improve the load balancing.

This approach does simplify the lld and LTO side, but I'd like to get thoughts from @MaskRay, as this changes the interfacing with lld. The downside is that it then also requires support for a larger new DTLTO library (it looks like the original patch only contains a Local plugin - do each of the other distributed build systems need to replicate a lot of that functionality?).

tru · 2025-04-09T14:21:16Z

I just wanted to add my comment here. I have been working with @bd1976bris to try out these changes with FastBuild for our internal build infrastructure and I am happy to report that I have a PoC working based on this and I was recently able to get distribution to work. I found the interface easy to work with and was able to get my PoC to work pretty quickly.

I am happy for this PR to move forward at this point. We probably need some changes in order to be more cache friendly, but I think that can come later.

romanova-ekaterina · 2025-04-10T09:14:40Z

Hi Teresa,

This approach does simplify the lld and LTO side, but I'd like to get thoughts from @MaskRay, as this changes the interfacing with lld. The downside is that it then also requires support for a larger new DTLTO library (it looks like the original patch only contains a Local plugin - do each of the other distributed build systems need to replicate a lot of that functionality?).

No, replication of the functionality will not be needed for plugins in general.
All common functionality will be in DTLTO static library
(DTLTO.lib) which will be statically linked with lld.

Plugins do not need to be linked with DTLTO static library.
Plugins could be implemented in any programming language that allow creating shared libraries.

romanova-ekaterina · 2025-04-10T09:24:01Z

I just wanted to add my comment here. I have been working with @bd1976bris to try out these changes with FastBuild for our internal build infrastructure and I am happy to report that I have a PoC working based on this and I was recently able to get distribution to work. I found the interface easy to work with and was able to get my PoC to work pretty quickly.

One thing might worth mentioning about two different solutions.
The first solution has DTLTO backend, which generates JSON file and passes it
to external process (e.g. python script).
The second solution doesn't have a DTLTO backend. It's basically a tiny layer between the linker and DBS-specific plugin, which calls the plugin's exported functions.

However, we could have a hybrid (mix and match) solution: e.g. "no DTLTO backend" than invokes external process.

bd1976bris · 2025-04-10T17:00:18Z

I just wanted to add my comment here. I have been working with @bd1976bris to try out these changes with FastBuild for our internal build infrastructure and I am happy to report that I have a PoC working based on this and I was recently able to get distribution to work. I found the interface easy to work with and was able to get my PoC to work pretty quickly.

I am happy for this PR to move forward at this point. We probably need some changes in order to be more cache friendly, but I think that can come later.

While working with Tobias, I back-ported this PR to LLVM 19 and wired up the Clang and LLD side changes.

In case it helps anyone else, that can be found here: https://github.com/bd1976bris/llvm-project/commits/dtlto_llvm19/.

cachemeifyoucan · 2025-04-10T17:27:22Z

I don't have a strong opinion for either implementation. If I have to pick one, I like this PR better. I feel like the alternative will require keeping more implementation details of the build system in tree of LLVM. That is not a problem for itself but I don't know how stable those APIs and implementation details are, so it might be a problem for integration.

On the other side, I see this JSON interface from this PR is very simple and stable (maybe we should add a version in there just to be safe). I don't think the LTO interface is bloated after the change and it is not stable anyway. Thus slightly prefer this patch.

romanova-ekaterina · 2025-04-11T23:40:47Z

I don't have a strong opinion for either implementation. If I have to pick one, I like this PR better. I feel like the alternative will require keeping more implementation details of the build system in tree of LLVM. That is not a problem for itself but I don't know how stable those APIs and implementation details are, so it might be a problem for integration.

On the other side, I see this JSON interface from this PR is very simple and stable (maybe we should add a version in there just to be safe). I don't think the LTO interface is bloated after the change and it is not stable anyway. Thus slightly prefer this patch.

Actually, neither of the solutions will keep implementation details of the distribution build systems in the tree of LLVM.

The plugins or external processes will either be in a separate GitHub repository or will be placed in a repository for particular distribution build systems (e.g. the DTLTO-related plugin for distcc will be added to the distcc repository). The location will be discussed with upstream and decided.

Also, it's worth noting that both interfaces could be supported at the same time in DTLTO: external processes interface for those who prefer simplicity or plugin interface for those who need performance. To understand the performance advantage that the plugins interface could provide, please refer to items (5) and (6) in the comment that kbelochapka provided:
#127749 (comment)

Or we could have a hybrid (mix and match) solution: e.g. "no DTLTO backend" (from alternative solution) that invokes external process (from main solution) or vice versa.

The main decision we are looking for reviewers' input on is whether they prefer having the DTLTO code in a ThinLTO backend (main solution) or not in a backend (alternative solution). To understand the main differences, please refer to items (1), (2), and (3) in the comment that kbelochapka provided:
#127749 (comment)

bd1976bris · 2025-05-01T16:12:48Z

I would like to summarize the state of this PR as I understand it. Firstly, some reviewers have indicated that this PR is acceptable in its current state. However, there is an alternative proposal: https://github.com/romanova-ekaterina/llvm-project/pull/new/kromanova/main/integrated-DTLTO-no-backend. We somehow need to come to a conclusion, and I think an appeal to authority (fallacy or not) is the way :)

@MaskRay – can I please ask you to reply to @teresajohnson's comment: #127749 (comment). @tru has created a distributor that works with his organization's fork of FASTBuild, using the JSON interface. If you prefer the no-backend approach, then the JSON interface from this PR could be merged into that approach.

Thanks.

bd1976bris · 2025-05-12T13:55:28Z

ping.

llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:support llvm:transforms labels Feb 19, 2025

This comment was marked as spam.

Sign in to view

bd1976bris requested a review from teresajohnson February 19, 2025 05:32

bd1976bris added 10 commits February 19, 2025 13:28

[DTLTO][LLVM] Derive the InProcess backend from a base class

d55d8c0

Move some setup code and state to a base class. This will allow the DTLTO ThinLTO backend to share this setup code and state by also deriving from this base class.

[DTLTO][LLVM] Translate some LTO configuration state into clang options.

0490b3b

Intentionally minimal for now. Additional state translation will be added in future commits.

[DTLTO][LLVM][Doc] Add DTLTO documentation

0c46c0c

bd1976bris force-pushed the DTLTO_llvm_only branch from 354e645 to 33dbf55 Compare February 19, 2025 14:32

bd1976bris requested review from MaskRay and removed request for MaskRay February 19, 2025 22:45

bd1976bris added 2 commits February 20, 2025 23:06

Improve the test distributors

9b5162d

1. Use "with" when opening files. 2. Add module docstrings.

Address minor test nits

2be4b0c

Support more than one LTO partition.

8923484

bd1976bris added 2 commits February 27, 2025 23:25

UI improvements to the current remote-opt-tool

0bfafdd

- Replace `thinlto-remote-opt-tool` with `thinlto-remote-compiler` - Make `thinlto-remote-compiler` a cl::opt

Update python module docstrings to match the script names

2f9d381

teresajohnson reviewed Feb 28, 2025

View reviewed changes

Sync with llvm#127749

a828b70

- Pull in documentation improvements.

bd1976bris requested review from thevinster and cachemeifyoucan March 6, 2025 00:03

cachemeifyoucan reviewed Mar 6, 2025

View reviewed changes

bd1976bris added 3 commits March 13, 2025 07:47

Implement minor review improvements

771617b

- Rename *.test LIT tests to *.ll, so that syntax highlighting works. - Remove `dtlto` argument from llvm-lto2 and instead check whether `dtlto-distributor` was specified. - Rename `distributor_args` to `input_file` in mock.py to be clearer.

Remove AddBuffer and use AddStream in the DTLTO LTO backend

b157507

This simplifies the implementation. AddBuffer can be reintroduced later if performance metrics justify modifying the LTO interface.

MaskRay reviewed Mar 16, 2025

View reviewed changes

bd1976bris added 4 commits March 21, 2025 16:26

Do not use cl::opt for argument passing

6701ce1

Use a simpler JSON format.

160ca2b

The documentation has been updated for the new format and I have addressed any outstanding documentation comments.

Remove temporary file section in docs.

a2bdb29

Users should not rely on the temporary file naming scheme which may be subject to change.

Add comment for the CGThinBackend base class.

79873d8

bd1976bris added 3 commits April 7, 2025 12:51

Improve comment on OutOfProcessThinBackend class.

eccadf5

Remove resolved TODO, as the approach for code-generation options has…

4cb165c

… now been decided.

Address minor issues and improvements identified in tests.

5efcec9

MaskRay reviewed Apr 8, 2025

View reviewed changes

		@@ -0,0 +1,3 @@
		Tests for DTLTO (integrated distributed ThinLTO) functionality.

		These are integration tests as DTLTO invokes `clang` for code-generation.

[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) #127749

Are you sure you want to change the base?

[DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) #127749

Conversation

bd1976bris commented Feb 19, 2025

This comment was marked as spam.

github-actions bot commented Feb 19, 2025 • edited Loading

github-actions bot commented Feb 19, 2025 • edited Loading

bd1976bris commented Feb 19, 2025

bd1976bris commented Feb 20, 2025

teresajohnson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bd1976bris Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cachemeifyoucan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cachemeifyoucan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bd1976bris Mar 19, 2025 • edited Loading

Choose a reason for hiding this comment

romanova-ekaterina commented Mar 16, 2025

bd1976bris commented Mar 21, 2025

MaskRay left a comment

Choose a reason for hiding this comment

teresajohnson commented Apr 8, 2025

kbelochapka commented Apr 9, 2025

teresajohnson commented Apr 9, 2025

tru commented Apr 9, 2025

romanova-ekaterina commented Apr 10, 2025 • edited Loading

romanova-ekaterina commented Apr 10, 2025

github-actions bot commented Feb 19, 2025 •

edited

Loading

github-actions bot commented Feb 19, 2025 •

edited

Loading

bd1976bris Mar 5, 2025 •

edited

Loading

bd1976bris Mar 19, 2025 •

edited

Loading

romanova-ekaterina commented Apr 10, 2025 •

edited

Loading