-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LTO] Fix a crash with thin LTO caching and asm output #138203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lto Author: Alexey Karyakin (quic-akaryaki) ChangesThe commit function of CacheFile deletes the underlying raw stream. Some output streamers may hold a pointer to it, which in this case will outlive the stream object. In particular, MCAsmStreamer keeps the pointer to the raw stream though formatted_raw_stream, which calls flush() in its destructor. If commit is called before MCAsmStreamer destruction, it will access freed memory. Fixes: #138194. Full diff: https://github.com/llvm/llvm-project/pull/138203.diff 1 Files Affected:
diff --git a/llvm/lib/LTO/LTOBackend.cpp b/llvm/lib/LTO/LTOBackend.cpp
index dd9197efa7718..905e841291cf8 100644
--- a/llvm/lib/LTO/LTOBackend.cpp
+++ b/llvm/lib/LTO/LTOBackend.cpp
@@ -439,27 +439,33 @@ static void codegen(const Config &Conf, TargetMachine *TM,
std::unique_ptr<CachedFileStream> &Stream = *StreamOrErr;
TM->Options.ObjectFilenameForDebug = Stream->ObjectPathName;
- legacy::PassManager CodeGenPasses;
- TargetLibraryInfoImpl TLII(Mod.getTargetTriple());
- CodeGenPasses.add(new TargetLibraryInfoWrapperPass(TLII));
- // No need to make index available if the module is empty.
- // In theory these passes should not use the index for an empty
- // module, however, this guards against doing any unnecessary summary-based
- // analysis in the case of a ThinLTO build where this might be an empty
- // regular LTO combined module, with a large combined index from ThinLTO.
- if (!isEmptyModule(Mod))
- CodeGenPasses.add(
- createImmutableModuleSummaryIndexWrapperPass(&CombinedIndex));
- if (Conf.PreCodeGenPassesHook)
- Conf.PreCodeGenPassesHook(CodeGenPasses);
- if (TM->addPassesToEmitFile(CodeGenPasses, *Stream->OS,
- DwoOut ? &DwoOut->os() : nullptr,
- Conf.CGFileType))
- report_fatal_error("Failed to setup codegen");
- CodeGenPasses.run(Mod);
-
- if (DwoOut)
- DwoOut->keep();
+ // Create the LTO pipeline in its own scope so it gets deleted before
+ // Stream->commit() is called. The commit function of CacheFile deletes
+ // the raw stream, which is too early as streamers (e.g. MCAsmStreamer)
+ // keep the pointer and may use it until their destruction. See #138194.
+ {
+ legacy::PassManager CodeGenPasses;
+ TargetLibraryInfoImpl TLII(Mod.getTargetTriple());
+ CodeGenPasses.add(new TargetLibraryInfoWrapperPass(TLII));
+ // No need to make index available if the module is empty.
+ // In theory these passes should not use the index for an empty
+ // module, however, this guards against doing any unnecessary summary-based
+ // analysis in the case of a ThinLTO build where this might be an empty
+ // regular LTO combined module, with a large combined index from ThinLTO.
+ if (!isEmptyModule(Mod))
+ CodeGenPasses.add(
+ createImmutableModuleSummaryIndexWrapperPass(&CombinedIndex));
+ if (Conf.PreCodeGenPassesHook)
+ Conf.PreCodeGenPassesHook(CodeGenPasses);
+ if (TM->addPassesToEmitFile(CodeGenPasses, *Stream->OS,
+ DwoOut ? &DwoOut->os() : nullptr,
+ Conf.CGFileType))
+ report_fatal_error("Failed to setup codegen");
+ CodeGenPasses.run(Mod);
+
+ if (DwoOut)
+ DwoOut->keep();
+ }
if (Error Err = Stream->commit())
report_fatal_error(std::move(Err));
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a test for this? (At least one that would fail in an asan build?)
Is this fall out of #136121? If so, I wonder if that PR should be reverted instead - it turns out that went in without a full review / approval, and if it is causing subtle bugs I would prefer to revert that one. |
I have added a test. |
I don't have a strong opinion, someone has to make a decision. If #136121 is reverted, I would like to keep the test. |
Keeping the test is good. What is the failure mode of the test without your fix? I've asked for a revert of PR136121 as it was supposed to make error handling easier, but here it sounds like there was a much more subtle issue produced by that change. |
Without the "fix", the test crashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again and at the commit function, I'm not really sure how they are related. Can you clarify (see questions below)?
llvm/lib/LTO/LTOBackend.cpp
Outdated
if (DwoOut) | ||
DwoOut->keep(); | ||
// Create the LTO pipeline in its own scope so it gets deleted before | ||
// Stream->commit() is called. The commit function of CacheFile deletes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/CacheFile/CachedFileStream/
However, can you clarify the sequence of events causing the crash? Looking at PR136121 and CachedFileStream::commit, that function doesn't actually delete anything. I'm also unclear on how deleting the codegen pass manager early prevents use after the call to commit() further below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CachedFileStream::commit calls OS.reset() which deletes the underlying raw stream. A pointer to this stream is kept by the formatted_raw_ostream created by CodeGenTargetMachineImpl::createMCStreamer(). This is then kept by the MCAsmStreamer object. #136121 does change the point at which the deletion of the stream happens since the legacy::PassManager object is destructed before the std::unique_ptr (since the constructor of the former happens later). With #136121 the commit happens before the legacy::PassManager destructor (and hence the MCAsmStreamer destructor, and hence the formatted_raw_stream destructor, and hence the flush() call). This PR causes the legacy::PassManager destructor to be called before the commit operation.
I have checked the other uses of CachedFileStream::commit() for similar lifetime issues, and did not find any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CachedFileStream::commit calls OS.reset() which deletes the underlying raw stream
I don't see this?
Ah - the OS.reset() call is in derived class CacheStream::commit, so I missed this earlier (not in the base CachedFileStream nor "CacheFile" noted in the comments of this PR).
The ownership issue and fix seem fragile to me - how do other users of CacheStream avoid this lifetime issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of ownership issue is unfortunately a common pitfall of C++ - short of rewriting the entire compiler in Rust, there isn't really a general way for object A to protect itself against object B taking a pointer to one of A's sub-objects C that outlasts the lifetime that A expected. However, it's no more fragile than it was before #136121 - if the legacy::PassManager constructor had been moved to before the std::unique_ptr<CachedFileStream> constructor, this problem would have been seen even without #136121 .
As for the other users of CacheStream, I just checked that there was no object constructed after the std::unique_ptr<CachedFileStream> constructor which could outlast the commit() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it's no more fragile than it was before #136121 - if the legacy::PassManager constructor had been moved to before the std::unique_ptr constructor, this problem would have been seen even without #136121 .
This isn't really the case the way the APIs are set up, however, because the MCAsmStreamer constructor takes an existing raw_pwrite_stream passed down via addPassesToEmitFile(). CacheFileStream is the owner of that raw_pwrite_stream, and which is created at the same time as CacheStream in the caching code at
llvm-project/llvm/lib/Support/Caching.cpp
Lines 169 to 170 in f4853d7
return std::make_unique<CacheStream>( | |
std::make_unique<raw_fd_ostream>(Temp->FD, /* ShouldClose */ false), |
With the change to add the destruction in CacheStream::commit, the lifetime of the underlying raw_pwrite_stream has been somewhat artificially shortened, leading to the non-overlapping lifetimes. Ideally MCAsmStreamer would not become the owner of an object that depends on the lifetime of an object it doesn't own, but that would require quite a bit of change to the interfaces of those functions, which have presumably existed that way for a long time.
Could the original issue be addressed by calling flush() on the owned raw_pwrite_stream in CacheStream::commit() instead of destructing it? If you look at the construction of the raw_fd_ostream object in the code I linked above, it's specifying that the file shouldn't even be closed on destruction, so all it is doing is the flush() afaict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But formatted_raw_ostream doesn't define its own flush(). Its destructor calls flush() which is the base raw_ostream::flush().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But formatted_raw_ostream doesn't define its own flush(). Its destructor calls flush() which is the base raw_ostream::flush().
Yes, buffering is implemented in the base class (raw_ostream) but it's used by the formatted_raw_ostream object (outer stream). What happens when the buffer is not empty is raw_ostream calls virtual write_imp() which is (re?)defined in formatted_raw_ostream to write pending data to the other (inner) stream.
Sorry it's so complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see the issue, which you described earlier but I didn't quite follow until poking around more. As you note there are 2 totally separate raw_ostream objects. The first which is owned by CacheFileStream is of derived type raw_fd_stream, and constructed when invoking the CacheStream (and base class CacheFileStream) constructor. The second is formatted_raw_ostream, which is also derived from raw_ostream but a different one, and it holds a pointer to the other raw_ostream object (via data member TheStream). The formatted_raw_ostream::write_impl calls write() on TheStream.
Presumably if CacheStream::commit was changed to invoke flush() instead, because formatted_raw_ostream has not yet been flushed (and therefore TheStream->write() hadn't been called yet on CacheStream's raw_ostream), we don't actually get the file written as expected by commit()?
This is unfortunately gnarly, and I agree that the only quick fixes are the one in this PR or reverting PR136121, or longer term changing some APIs so that e.g. CacheStream owns the formatted_raw_ostream. My preference is to revert PR136121, as improved errors are nice but IMO not worth the introduced complexity. Most of the callers of the new commit() function still call report_fatal_error on the result, so I don't know that it has improved the situation significantly anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to strongly advocate for leaving PR136121 in place and for merging this PR. My reasoning is as follows:
- There is a general principle in C++ that destructors should never do anything except release resources - if they perform operations that can fail then those errors cannot be handled except out-of-band (tearing down the process). PR136121 improves the localCache API in this respect. While doing so has uncovered other instances of destructors doing non-release operations, that in itself is not a reason to abandon this engineering principle.
- While most of the callers of the new commit() function currently still call report_fatal_error on the result (as I wanted to keep PR136121 as unimpactful as possible), they may wish to improve their error-handling in the future and PR136121 is a necessary component of that. The motivating case is an out-of-tree use https://github.com/ROCm/llvm-project/blob/amd-staging/amd/comgr/src/comgr-cache.cpp for which tearing down the process would be highly undesirable - if that was an unavoidable aspect of the localCache API then ROCm would need to have its own fork of the localCache API (that didn't do the commit in the destructor) with the maintenance headaches that would incur. It seems likely to me that other users of localCache will in the future realise that they have similar requirements.
- Reverting PR136121 now would also mean that the other pieces of code that have had to be changed in response to PR136121 would have to be changed back - it would be more churn.
- I stand by what I said about there being latent fragility in the codegen() function in llvm/lib/LTO/LTOBackend.cpp . The MCAsmStreamer object is destroyed (and hence the formatted_raw_ostream object) when CodeGenPasses is destroyed. If the CodeGenPasses constructor were in the future to be moved above the Stream constructor then it would be destroyed after the CachedFileStream (the order in which these destructors are run is the reverse of the order in which the corresponding constructors are run - the position of the addPassesToEmitFile() call is not relevant as the end of the lifetime of the MCAsmStreamer object is tied to the end of the lifetime of the legacy::PassManager object and hence the time when that object was constructed). So reverting PR136121 wouldn't actually solve the latent fragility that is the purported reason for reverting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go ahead and approve this PR to address the crash, because I think the underlying issue lies in the asm printing passes and their setup.
Specifically, createMCStreamer creates the formatted_raw_ostream, which stores a pointer to the provided raw_ostream reference that it doesn't own, and addAsmPrinter which in turn stores that in the pass manager. All clients of addPassesToEmitFile, which takes the raw_ostream and pass manager references and invokes these APIs, currently have to ensure that the lifetime of the provided raw_ostream object outlasts the lifetime of the pass manager.
I stand by what I said about there being latent fragility in the codegen() function in llvm/lib/LTO/LTOBackend.cpp . The MCAsmStreamer object is destroyed (and hence the formatted_raw_ostream object) when CodeGenPasses is destroyed. If the CodeGenPasses constructor were in the future to be moved above the Stream constructor then it would be destroyed after the CachedFileStream
the position of the addPassesToEmitFile() call is not relevant
Yes, I see what you mean. It is relevant to the problem as it takes both the raw_ostream and the pass manager, and it is down in its handling where the latter takes a pointer to the former, but not to the relative ordering of their construction/destruction. As noted earlier, this issue is not specific to the LTOBackend codegen(), the potential for a problem exists in any client of this code gen pass setup.
2d37722
to
1ad72ca
Compare
The forced push is just to edit the comment as requested by @teresajohnson |
The commit function of CacheStream deletes the underlying raw stream. Some output streamers may hold a pointer to it, which in this case will outlive the stream object. In particular, MCAsmStreamer keeps the pointer to the raw stream though formatted_raw_stream, which calls flush() in its destructor. If commit is called before MCAsmStreamer destruction, it will access freed memory. S
1ad72ca
to
c5402c1
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/18505 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/18307 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/18541 Here is the relevant piece of the build log for the reference
|
This is a follow-up to llvm#138203. The added test used lld but lld is not always available. Switch to using llvm-lto2.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/25867 Here is the relevant piece of the build log for the reference
|
This is a follow-up to #138203. The added test used lld but lld is not always available, which breaks builds. Make the test use llvm-lto2. Also make the test a bit more similar to other tests in the same directory.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/160/builds/17438 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/180/builds/17581 Here is the relevant piece of the build log for the reference
|
The commit function of CacheFile deletes the underlying raw stream. Some output streamers may hold a pointer to it, which in this case will outlive the stream object. In particular, MCAsmStreamer keeps the pointer to the raw stream though formatted_raw_stream, which calls flush() in its destructor. If commit is called before MCAsmStreamer destruction, it will access freed memory.
Fixes: #138194.