Skip to content

Commit a8d2d16

Browse files
authored
Parallelize module loading in POSIX dyld code (#130912)
This patch improves LLDB launch time on Linux machines for **preload scenarios**, particularly for executables with a lot of shared library dependencies (or modules). Specifically: * Launching a binary with `target.preload-symbols = true` * Attaching to a process with `target.preload-symbols = true`. It's completely controlled by a new flag added in the first commit `plugin.dynamic-loader.posix-dyld.parallel-module-load`, which *defaults to false*. This was inspired by similar work on Darwin #110646. Some rough numbers to showcase perf improvement, run on a very beefy machine: * Executable with ~5600 modules: baseline 45s, improvement 15s * Executable with ~3800 modules: baseline 25s, improvement 10s * Executable with ~6650 modules: baseline 67s, improvement 20s * Executable with ~12500 modules: baseline 185s, improvement 85s * Executable with ~14700 modules: baseline 235s, improvement 120s A lot of targets we deal with have a *ton* of modules, and unfortunately we're unable to convince other folks to reduce the number of modules, so performance improvements like this can be very impactful for user experience. This patch achieves the performance improvement by parallelizing `DynamicLoaderPOSIXDYLD::RefreshModules` for the launch scenario, and `DynamicLoaderPOSIXDYLD::LoadAllCurrentModules` for the attach scenario. The commits have some context on their specific changes as well -- hopefully this helps the review. # More context on implementation We discovered the bottlenecks by via `perf record -g -p <lldb's pid>` on a Linux machine. With an executable known to have 1000s of shared library dependencies, I ran ``` (lldb) b main (lldb) r # taking a while ``` and showed the resulting perf trace (snippet shown) ``` Samples: 85K of event 'cycles:P', Event count (approx.): 54615855812 Children Self Command Shared Object Symbol - 93.54% 0.00% intern-state libc.so.6 [.] clone3 clone3 start_thread lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) r std::_Function_handler<void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data const&) lldb_private::Process::RunPrivateStateThread(bool) n - lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&) - 93.54% lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*) - 93.54% lldb_private::ThreadList::ShouldStop(lldb_private::Event*) - lldb_private::Thread::ShouldStop(lldb_private::Event*) * - 93.53% lldb_private::StopInfoBreakpoint::ShouldStopSynchronous(lldb_private::Event*) t - 93.52% lldb_private::BreakpointSite::ShouldStop(lldb_private::StoppointCallbackContext*) i lldb_private::BreakpointLocationCollection::ShouldStop(lldb_private::StoppointCallbackContext*) k lldb_private::BreakpointLocation::ShouldStop(lldb_private::StoppointCallbackContext*) b lldb_private::BreakpointOptions::InvokeCallback(lldb_private::StoppointCallbackContext*, unsigned long, unsigned long) i DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit(void*, lldb_private::StoppointCallbackContext*, unsigned long, unsigned lo - DynamicLoaderPOSIXDYLD::RefreshModules() O - 93.42% DynamicLoaderPOSIXDYLD::RefreshModules()::$_0::operator()(DYLDRendezvous::SOEntry const&) const u - 93.40% DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, bools - lldb_private::DynamicLoader::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, boos - 83.90% lldb_private::DynamicLoader::FindModuleViaTarget(lldb_private::FileSpec const&) o - 83.01% lldb_private::Target::GetOrCreateModule(lldb_private::ModuleSpec const&, bool, lldb_private::Status* - 77.89% lldb_private::Module::PreloadSymbols() - 44.06% lldb_private::Symtab::PreloadSymbols() - 43.66% lldb_private::Symtab::InitNameIndexes() ... ``` We saw that majority of time was spent in `RefreshModules`, with the main culprit within it `LoadModuleAtAddress` which eventually calls `PreloadSymbols`. At first, `DynamicLoaderPOSIXDYLD::LoadModuleAtAddress` appears fairly independent -- most of it deals with different files and then getting or creating Modules from these files. The portions that aren't independent seem to deal with ModuleLists, which appear concurrency safe. There were members of `DynamicLoaderPOSIXDYLD` I had to synchronize though: namely `m_loaded_modules` which `DynamicLoaderPOSIXDYLD` maintains to map its loaded modules to their link addresses. Without synchronizing this, I ran into SEGFAULTS and other issues when running `check-lldb`. I also locked the assignment and comparison of `m_interpreter_module`, which may be unnecessary. # Alternate implementations When creating this patch, another implementation I considered was directly background-ing the call to `Module::PreloadSymbol` in `Target::GetOrCreateModule`. It would have the added benefit of working across platforms generically, and appeared to be concurrency safe. It was done via `Debugger::GetThreadPool().async` directly. However, there were a ton of concurrency issues, so I abandoned that approach for now. # Testing With the feature active, I tested via `ninja check-lldb` on both Debug and Release builds several times (~5 or 6 altogether?), and didn't spot additional failing or flaky tests. I also tested manually on several different binaries, some with around 14000 modules, but just basic operations: launching, reaching main, setting breakpoint, stepping, showing some backtraces. I've also tested with the flag off just to make sure things behave properly synchronously.
1 parent 6afe5e5 commit a8d2d16

File tree

5 files changed

+133
-42
lines changed

5 files changed

+133
-42
lines changed

lldb/include/lldb/Target/Target.h

+2
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,8 @@ class TargetProperties : public Properties {
118118

119119
llvm::StringRef GetLaunchWorkingDirectory() const;
120120

121+
bool GetParallelModuleLoad() const;
122+
121123
const char *GetDisassemblyFlavor() const;
122124

123125
const char *GetDisassemblyCPU() const;

lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp

+109-38
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
#include "DynamicLoaderPOSIXDYLD.h"
1111

1212
#include "lldb/Breakpoint/BreakpointLocation.h"
13+
#include "lldb/Core/Debugger.h"
1314
#include "lldb/Core/Module.h"
1415
#include "lldb/Core/ModuleSpec.h"
1516
#include "lldb/Core/PluginManager.h"
@@ -25,6 +26,7 @@
2526
#include "lldb/Utility/LLDBLog.h"
2627
#include "lldb/Utility/Log.h"
2728
#include "lldb/Utility/ProcessInfo.h"
29+
#include "llvm/Support/ThreadPool.h"
2830

2931
#include <memory>
3032
#include <optional>
@@ -184,16 +186,37 @@ void DynamicLoaderPOSIXDYLD::DidLaunch() {
184186

185187
Status DynamicLoaderPOSIXDYLD::CanLoadImage() { return Status(); }
186188

189+
void DynamicLoaderPOSIXDYLD::SetLoadedModule(const ModuleSP &module_sp,
190+
addr_t link_map_addr) {
191+
llvm::sys::ScopedWriter lock(m_loaded_modules_rw_mutex);
192+
m_loaded_modules[module_sp] = link_map_addr;
193+
}
194+
195+
void DynamicLoaderPOSIXDYLD::UnloadModule(const ModuleSP &module_sp) {
196+
llvm::sys::ScopedWriter lock(m_loaded_modules_rw_mutex);
197+
m_loaded_modules.erase(module_sp);
198+
}
199+
200+
std::optional<lldb::addr_t>
201+
DynamicLoaderPOSIXDYLD::GetLoadedModuleLinkAddr(const ModuleSP &module_sp) {
202+
llvm::sys::ScopedReader lock(m_loaded_modules_rw_mutex);
203+
auto it = m_loaded_modules.find(module_sp);
204+
if (it != m_loaded_modules.end())
205+
return it->second;
206+
return std::nullopt;
207+
}
208+
187209
void DynamicLoaderPOSIXDYLD::UpdateLoadedSections(ModuleSP module,
188210
addr_t link_map_addr,
189211
addr_t base_addr,
190212
bool base_addr_is_offset) {
191-
m_loaded_modules[module] = link_map_addr;
213+
SetLoadedModule(module, link_map_addr);
214+
192215
UpdateLoadedSectionsCommon(module, base_addr, base_addr_is_offset);
193216
}
194217

195218
void DynamicLoaderPOSIXDYLD::UnloadSections(const ModuleSP module) {
196-
m_loaded_modules.erase(module);
219+
UnloadModule(module);
197220

198221
UnloadSectionsCommon(module);
199222
}
@@ -401,7 +424,7 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
401424
// The rendezvous class doesn't enumerate the main module, so track that
402425
// ourselves here.
403426
ModuleSP executable = GetTargetExecutable();
404-
m_loaded_modules[executable] = m_rendezvous.GetLinkMapAddress();
427+
SetLoadedModule(executable, m_rendezvous.GetLinkMapAddress());
405428

406429
DYLDRendezvous::iterator I;
407430
DYLDRendezvous::iterator E;
@@ -423,34 +446,70 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
423446
E = m_rendezvous.end();
424447
m_initial_modules_added = true;
425448
}
426-
for (; I != E; ++I) {
427-
// Don't load a duplicate copy of ld.so if we have already loaded it
428-
// earlier in LoadInterpreterModule. If we instead loaded then unloaded it
429-
// later, the section information for ld.so would be removed. That
430-
// information is required for placing breakpoints on Arm/Thumb systems.
431-
if ((m_interpreter_module.lock() != nullptr) &&
432-
(I->base_addr == m_interpreter_base))
433-
continue;
434-
435-
ModuleSP module_sp =
436-
LoadModuleAtAddress(I->file_spec, I->link_addr, I->base_addr, true);
437-
if (!module_sp.get())
438-
continue;
439-
440-
if (module_sp->GetObjectFile()->GetBaseAddress().GetLoadAddress(
441-
&m_process->GetTarget()) == m_interpreter_base) {
442-
ModuleSP interpreter_sp = m_interpreter_module.lock();
443-
if (m_interpreter_module.lock() == nullptr) {
444-
m_interpreter_module = module_sp;
445-
} else if (module_sp == interpreter_sp) {
446-
// Module already loaded.
447-
continue;
448-
}
449-
}
450449

451-
loaded_modules.AppendIfNeeded(module_sp);
452-
new_modules.Append(module_sp);
450+
// Synchronize reading and writing of `m_interpreter_module`.
451+
std::mutex interpreter_module_mutex;
452+
// We should be able to take SOEntry as reference since the data
453+
// exists for the duration of this call in `m_rendezvous`.
454+
auto load_module_fn =
455+
[this, &loaded_modules, &new_modules,
456+
&interpreter_module_mutex](const DYLDRendezvous::SOEntry &so_entry) {
457+
// Don't load a duplicate copy of ld.so if we have already loaded it
458+
// earlier in LoadInterpreterModule. If we instead loaded then
459+
// unloaded it later, the section information for ld.so would be
460+
// removed. That information is required for placing breakpoints on
461+
// Arm/Thumb systems.
462+
{
463+
// `m_interpreter_module` may be modified by another thread at the
464+
// same time, so we guard the access here.
465+
std::lock_guard<std::mutex> lock(interpreter_module_mutex);
466+
if ((m_interpreter_module.lock() != nullptr) &&
467+
(so_entry.base_addr == m_interpreter_base))
468+
return;
469+
}
470+
471+
ModuleSP module_sp = LoadModuleAtAddress(
472+
so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
473+
if (!module_sp.get())
474+
return;
475+
476+
{
477+
// `m_interpreter_module` may be modified by another thread at the
478+
// same time, so we guard the access here.
479+
std::lock_guard<std::mutex> lock(interpreter_module_mutex);
480+
// Set the interpreter module, if this is the interpreter.
481+
if (module_sp->GetObjectFile()->GetBaseAddress().GetLoadAddress(
482+
&m_process->GetTarget()) == m_interpreter_base) {
483+
ModuleSP interpreter_sp = m_interpreter_module.lock();
484+
if (m_interpreter_module.lock() == nullptr) {
485+
m_interpreter_module = module_sp;
486+
} else if (module_sp == interpreter_sp) {
487+
// Module already loaded.
488+
return;
489+
}
490+
}
491+
}
492+
493+
// Note: in a multi-threaded environment, these module lists may be
494+
// appended to out-of-order. This is fine, since there's no
495+
// expectation for `loaded_modules` or `new_modules` to be in any
496+
// particular order, and appending to each module list is thread-safe.
497+
// Also, `new_modules` is only used for the `ModulesDidLoad` call at
498+
// the end of this function.
499+
loaded_modules.AppendIfNeeded(module_sp);
500+
new_modules.Append(module_sp);
501+
};
502+
503+
if (m_process->GetTarget().GetParallelModuleLoad()) {
504+
llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
505+
for (; I != E; ++I)
506+
task_group.async(load_module_fn, *I);
507+
task_group.wait();
508+
} else {
509+
for (; I != E; ++I)
510+
load_module_fn(*I);
453511
}
512+
454513
m_process->GetTarget().ModulesDidLoad(new_modules);
455514
}
456515

@@ -636,27 +695,39 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
636695
// The rendezvous class doesn't enumerate the main module, so track that
637696
// ourselves here.
638697
ModuleSP executable = GetTargetExecutable();
639-
m_loaded_modules[executable] = m_rendezvous.GetLinkMapAddress();
698+
SetLoadedModule(executable, m_rendezvous.GetLinkMapAddress());
640699

641700
std::vector<FileSpec> module_names;
642701
for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
643702
module_names.push_back(I->file_spec);
644703
m_process->PrefetchModuleSpecs(
645704
module_names, m_process->GetTarget().GetArchitecture().GetTriple());
646705

647-
for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
648-
ModuleSP module_sp =
649-
LoadModuleAtAddress(I->file_spec, I->link_addr, I->base_addr, true);
706+
auto load_module_fn = [this, &module_list,
707+
&log](const DYLDRendezvous::SOEntry &so_entry) {
708+
ModuleSP module_sp = LoadModuleAtAddress(
709+
so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
650710
if (module_sp.get()) {
651711
LLDB_LOG(log, "LoadAllCurrentModules loading module: {0}",
652-
I->file_spec.GetFilename());
712+
so_entry.file_spec.GetFilename());
653713
module_list.Append(module_sp);
654714
} else {
655715
Log *log = GetLog(LLDBLog::DynamicLoader);
656716
LLDB_LOGF(
657717
log,
658718
"DynamicLoaderPOSIXDYLD::%s failed loading module %s at 0x%" PRIx64,
659-
__FUNCTION__, I->file_spec.GetPath().c_str(), I->base_addr);
719+
__FUNCTION__, so_entry.file_spec.GetPath().c_str(),
720+
so_entry.base_addr);
721+
}
722+
};
723+
if (m_process->GetTarget().GetParallelModuleLoad()) {
724+
llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
725+
for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
726+
task_group.async(load_module_fn, *I);
727+
task_group.wait();
728+
} else {
729+
for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
730+
load_module_fn(*I);
660731
}
661732
}
662733

@@ -728,15 +799,15 @@ DynamicLoaderPOSIXDYLD::GetThreadLocalData(const lldb::ModuleSP module_sp,
728799
const lldb::ThreadSP thread,
729800
lldb::addr_t tls_file_addr) {
730801
Log *log = GetLog(LLDBLog::DynamicLoader);
731-
auto it = m_loaded_modules.find(module_sp);
732-
if (it == m_loaded_modules.end()) {
802+
std::optional<addr_t> link_map_addr_opt = GetLoadedModuleLinkAddr(module_sp);
803+
if (!link_map_addr_opt.has_value()) {
733804
LLDB_LOGF(
734805
log, "GetThreadLocalData error: module(%s) not found in loaded modules",
735806
module_sp->GetObjectName().AsCString());
736807
return LLDB_INVALID_ADDRESS;
737808
}
738809

739-
addr_t link_map = it->second;
810+
addr_t link_map = link_map_addr_opt.value();
740811
if (link_map == LLDB_INVALID_ADDRESS || link_map == 0) {
741812
LLDB_LOGF(log,
742813
"GetThreadLocalData error: invalid link map address=0x%" PRIx64,

lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h

+13-4
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,6 @@ class DynamicLoaderPOSIXDYLD : public lldb_private::DynamicLoader {
9393
/// Contains the pointer to the interpret module, if loaded.
9494
std::weak_ptr<lldb_private::Module> m_interpreter_module;
9595

96-
/// Loaded module list. (link map for each module)
97-
std::map<lldb::ModuleWP, lldb::addr_t, std::owner_less<lldb::ModuleWP>>
98-
m_loaded_modules;
99-
10096
/// Returns true if the process is for a core file.
10197
bool IsCoreFile() const;
10298

@@ -180,6 +176,19 @@ class DynamicLoaderPOSIXDYLD : public lldb_private::DynamicLoader {
180176
DynamicLoaderPOSIXDYLD(const DynamicLoaderPOSIXDYLD &) = delete;
181177
const DynamicLoaderPOSIXDYLD &
182178
operator=(const DynamicLoaderPOSIXDYLD &) = delete;
179+
180+
/// Loaded module list. (link map for each module)
181+
/// This may be accessed in a multi-threaded context. Use the accessor methods
182+
/// to access `m_loaded_modules` safely.
183+
std::map<lldb::ModuleWP, lldb::addr_t, std::owner_less<lldb::ModuleWP>>
184+
m_loaded_modules;
185+
llvm::sys::RWMutex m_loaded_modules_rw_mutex;
186+
187+
void SetLoadedModule(const lldb::ModuleSP &module_sp,
188+
lldb::addr_t link_map_addr);
189+
void UnloadModule(const lldb::ModuleSP &module_sp);
190+
std::optional<lldb::addr_t>
191+
GetLoadedModuleLinkAddr(const lldb::ModuleSP &module_sp);
183192
};
184193

185194
#endif // LLDB_SOURCE_PLUGINS_DYNAMICLOADER_POSIX_DYLD_DYNAMICLOADERPOSIXDYLD_H

lldb/source/Target/Target.cpp

+6
Original file line numberDiff line numberDiff line change
@@ -4488,6 +4488,12 @@ llvm::StringRef TargetProperties::GetLaunchWorkingDirectory() const {
44884488
idx, g_target_properties[idx].default_cstr_value);
44894489
}
44904490

4491+
bool TargetProperties::GetParallelModuleLoad() const {
4492+
const uint32_t idx = ePropertyParallelModuleLoad;
4493+
return GetPropertyAtIndexAs<bool>(
4494+
idx, g_target_properties[idx].default_uint_value != 0);
4495+
}
4496+
44914497
const char *TargetProperties::GetDisassemblyFlavor() const {
44924498
const uint32_t idx = ePropertyDisassemblyFlavor;
44934499
const char *return_value;

lldb/source/Target/TargetProperties.td

+3
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,9 @@ let Definition = "target" in {
217217
"launched. If you change this setting, the new value will only apply to "
218218
"subsequent launches. Commands that take an explicit working directory "
219219
"will override this setting.">;
220+
def ParallelModuleLoad: Property<"parallel-module-load", "Boolean">,
221+
DefaultTrue,
222+
Desc<"Enable loading of modules in parallel for the dynamic loader.">;
220223
}
221224

222225
let Definition = "process_experimental" in {

0 commit comments

Comments
 (0)