Skip to content

[clang][modules][deps] Optimize in-process timestamping of PCMs #137363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jansvoboda11
Copy link
Contributor

In the past, timestamps used for -fmodules-validate-once-per-build-session were found to be a source of contention in the dependency scanner (D149802, #112452). This PR is yet another attempt to optimize these. We now make use of the new ModuleCache interface to implement the in-process version in terms of atomic std::time_t variables rather the mtime attribute on .timestamp files.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Apr 25, 2025
@jansvoboda11
Copy link
Contributor Author

CC @artemcm

@llvmbot
Copy link
Member

llvmbot commented Apr 25, 2025

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Jan Svoboda (jansvoboda11)

Changes

In the past, timestamps used for -fmodules-validate-once-per-build-session were found to be a source of contention in the dependency scanner (D149802, #112452). This PR is yet another attempt to optimize these. We now make use of the new ModuleCache interface to implement the in-process version in terms of atomic std::time_t variables rather the mtime attribute on .timestamp files.


Full diff: https://github.com/llvm/llvm-project/pull/137363.diff

12 Files Affected:

  • (modified) clang/include/clang/Serialization/ModuleCache.h (+15-1)
  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h (+11-4)
  • (modified) clang/include/clang/Tooling/DependencyScanning/InProcessModuleCache.h (+8-3)
  • (modified) clang/lib/Serialization/ASTCommon.cpp (-12)
  • (modified) clang/lib/Serialization/ASTCommon.h (-2)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+2-1)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+1-1)
  • (modified) clang/lib/Serialization/ModuleCache.cpp (+23)
  • (modified) clang/lib/Serialization/ModuleManager.cpp (+3-9)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningService.cpp (+4-2)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp (+5-1)
  • (modified) clang/lib/Tooling/DependencyScanning/InProcessModuleCache.cpp (+39-11)
diff --git a/clang/include/clang/Serialization/ModuleCache.h b/clang/include/clang/Serialization/ModuleCache.h
index a7ba26bc4daae..3117d954a09cc 100644
--- a/clang/include/clang/Serialization/ModuleCache.h
+++ b/clang/include/clang/Serialization/ModuleCache.h
@@ -12,6 +12,8 @@
 #include "clang/Basic/LLVM.h"
 #include "llvm/ADT/IntrusiveRefCntPtr.h"
 
+#include <ctime>
+
 namespace llvm {
 class AdvisoryLock;
 } // namespace llvm
@@ -31,11 +33,23 @@ class ModuleCache : public RefCountedBase<ModuleCache> {
   virtual std::unique_ptr<llvm::AdvisoryLock>
   getLock(StringRef ModuleFilename) = 0;
 
+  // TODO: Abstract away timestamps with isUpToDate() and markUpToDate().
+  // TODO: Consider exposing a "validation lock" API to prevent multiple clients
+  // concurrently noticing an out-of-date module file and validating its inputs.
+
+  /// Returns the timestamp denoting the last time inputs of the module file
+  /// were validated.
+  virtual std::time_t getModuleTimestamp(StringRef ModuleFilename) = 0;
+
+  /// Updates the timestamp denoting the last time inputs of the module file
+  /// were validated.
+  virtual void updateModuleTimestamp(StringRef ModuleFilename) = 0;
+
   /// Returns this process's view of the module cache.
   virtual InMemoryModuleCache &getInMemoryModuleCache() = 0;
   virtual const InMemoryModuleCache &getInMemoryModuleCache() const = 0;
 
-  // TODO: Virtualize writing/reading PCM files, timestamping, pruning, etc.
+  // TODO: Virtualize writing/reading PCM files, pruning, etc.
 
   virtual ~ModuleCache() = default;
 };
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
index 5e8b37e791383..4e97c7bc9f36e 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
@@ -12,6 +12,7 @@
 #include "clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h"
 #include "clang/Tooling/DependencyScanning/InProcessModuleCache.h"
 #include "llvm/ADT/BitmaskEnum.h"
+#include "llvm/Support/Chrono.h"
 
 namespace clang {
 namespace tooling {
@@ -84,7 +85,9 @@ class DependencyScanningService {
   DependencyScanningService(
       ScanningMode Mode, ScanningOutputFormat Format,
       ScanningOptimizations OptimizeArgs = ScanningOptimizations::Default,
-      bool EagerLoadModules = false, bool TraceVFS = false);
+      bool EagerLoadModules = false, bool TraceVFS = false,
+      std::time_t BuildSessionTimestamp =
+          llvm::sys::toTimeT(std::chrono::system_clock::now()));
 
   ScanningMode getMode() const { return Mode; }
 
@@ -100,7 +103,9 @@ class DependencyScanningService {
     return SharedCache;
   }
 
-  ModuleCacheMutexes &getModuleCacheMutexes() { return ModCacheMutexes; }
+  ModuleCacheEntries &getModuleCacheEntries() { return ModCacheEntries; }
+
+  std::time_t getBuildSessionTimestamp() const { return BuildSessionTimestamp; }
 
 private:
   const ScanningMode Mode;
@@ -113,8 +118,10 @@ class DependencyScanningService {
   const bool TraceVFS;
   /// The global file system cache.
   DependencyScanningFilesystemSharedCache SharedCache;
-  /// The global module cache mutexes.
-  ModuleCacheMutexes ModCacheMutexes;
+  /// The global module cache entries.
+  ModuleCacheEntries ModCacheEntries;
+  /// The build session timestamp.
+  std::time_t BuildSessionTimestamp;
 };
 
 } // end namespace dependencies
diff --git a/clang/include/clang/Tooling/DependencyScanning/InProcessModuleCache.h b/clang/include/clang/Tooling/DependencyScanning/InProcessModuleCache.h
index ba0454380b665..213e60b39c199 100644
--- a/clang/include/clang/Tooling/DependencyScanning/InProcessModuleCache.h
+++ b/clang/include/clang/Tooling/DependencyScanning/InProcessModuleCache.h
@@ -18,13 +18,18 @@
 namespace clang {
 namespace tooling {
 namespace dependencies {
-struct ModuleCacheMutexes {
+struct ModuleCacheEntry {
+  std::shared_mutex CompilationMutex;
+  std::atomic<std::time_t> Timestamp = 0;
+};
+
+struct ModuleCacheEntries {
   std::mutex Mutex;
-  llvm::StringMap<std::unique_ptr<std::shared_mutex>> Map;
+  llvm::StringMap<std::unique_ptr<ModuleCacheEntry>> Map;
 };
 
 IntrusiveRefCntPtr<ModuleCache>
-makeInProcessModuleCache(ModuleCacheMutexes &Mutexes);
+makeInProcessModuleCache(ModuleCacheEntries &Entries);
 } // namespace dependencies
 } // namespace tooling
 } // namespace clang
diff --git a/clang/lib/Serialization/ASTCommon.cpp b/clang/lib/Serialization/ASTCommon.cpp
index 320ee0e65dbea..ad277f19711ff 100644
--- a/clang/lib/Serialization/ASTCommon.cpp
+++ b/clang/lib/Serialization/ASTCommon.cpp
@@ -510,15 +510,3 @@ bool serialization::needsAnonymousDeclarationNumber(const NamedDecl *D) {
     return false;
   return isa<TagDecl, FieldDecl>(D);
 }
-
-void serialization::updateModuleTimestamp(StringRef ModuleFilename) {
-  // Overwrite the timestamp file contents so that file's mtime changes.
-  std::error_code EC;
-  llvm::raw_fd_ostream OS(ModuleFile::getTimestampFilename(ModuleFilename), EC,
-                          llvm::sys::fs::OF_TextWithCRLF);
-  if (EC)
-    return;
-  OS << "Timestamp file\n";
-  OS.close();
-  OS.clear_error(); // Avoid triggering a fatal error.
-}
diff --git a/clang/lib/Serialization/ASTCommon.h b/clang/lib/Serialization/ASTCommon.h
index 7c9ec884ea049..2bc8cc26707bf 100644
--- a/clang/lib/Serialization/ASTCommon.h
+++ b/clang/lib/Serialization/ASTCommon.h
@@ -100,8 +100,6 @@ inline bool isPartOfPerModuleInitializer(const Decl *D) {
   return false;
 }
 
-void updateModuleTimestamp(StringRef ModuleFilename);
-
 } // namespace serialization
 
 } // namespace clang
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index f13a173ec933e..d3711160412d0 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -4934,7 +4934,8 @@ ASTReader::ASTReadResult ASTReader::ReadAST(StringRef FileName, ModuleKind Type,
       ImportedModule &M = Loaded[I];
       if (M.Mod->Kind == MK_ImplicitModule &&
           M.Mod->InputFilesValidationTimestamp < HSOpts.BuildSessionTimestamp)
-        updateModuleTimestamp(M.Mod->FileName);
+        getModuleManager().getModuleCache().updateModuleTimestamp(
+            M.Mod->FileName);
     }
   }
 
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index bea8fd5055358..5707e2b685868 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -5394,7 +5394,7 @@ ASTWriter::WriteAST(llvm::PointerUnion<Sema *, Preprocessor *> Subject,
   if (WritingModule && PPRef.getHeaderSearchInfo()
                            .getHeaderSearchOpts()
                            .ModulesValidateOncePerBuildSession)
-    updateModuleTimestamp(OutputFile);
+    ModCache.updateModuleTimestamp(OutputFile);
 
   if (ShouldCacheASTInMemory) {
     // Construct MemoryBuffer and update buffer manager.
diff --git a/clang/lib/Serialization/ModuleCache.cpp b/clang/lib/Serialization/ModuleCache.cpp
index 955e5f322bcc3..3ffd16fd3f7d0 100644
--- a/clang/lib/Serialization/ModuleCache.cpp
+++ b/clang/lib/Serialization/ModuleCache.cpp
@@ -9,6 +9,7 @@
 #include "clang/Serialization/ModuleCache.h"
 
 #include "clang/Serialization/InMemoryModuleCache.h"
+#include "clang/Serialization/ModuleFile.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/LockFileManager.h"
 #include "llvm/Support/Path.h"
@@ -32,6 +33,28 @@ class CrossProcessModuleCache : public ModuleCache {
     return std::make_unique<llvm::LockFileManager>(ModuleFilename);
   }
 
+  std::time_t getModuleTimestamp(StringRef ModuleFilename) override {
+    std::string TimestampFilename =
+        serialization::ModuleFile::getTimestampFilename(ModuleFilename);
+    llvm::sys::fs::file_status Status;
+    if (llvm::sys::fs::status(ModuleFilename, Status) != std::error_code{})
+      return {};
+    return llvm::sys::toTimeT(Status.getLastModificationTime());
+  }
+
+  void updateModuleTimestamp(StringRef ModuleFilename) override {
+    // Overwrite the timestamp file contents so that file's mtime changes.
+    std::error_code EC;
+    llvm::raw_fd_ostream OS(
+        serialization::ModuleFile::getTimestampFilename(ModuleFilename), EC,
+        llvm::sys::fs::OF_TextWithCRLF);
+    if (EC)
+      return;
+    OS << "Timestamp file\n";
+    OS.close();
+    OS.clear_error(); // Avoid triggering a fatal error.
+  }
+
   InMemoryModuleCache &getInMemoryModuleCache() override { return InMemory; }
   const InMemoryModuleCache &getInMemoryModuleCache() const override {
     return InMemory;
diff --git a/clang/lib/Serialization/ModuleManager.cpp b/clang/lib/Serialization/ModuleManager.cpp
index d466ea06301a6..e3d7ff4fd82a7 100644
--- a/clang/lib/Serialization/ModuleManager.cpp
+++ b/clang/lib/Serialization/ModuleManager.cpp
@@ -174,15 +174,9 @@ ModuleManager::addModule(StringRef FileName, ModuleKind Type,
   NewModule->ImportLoc = ImportLoc;
   NewModule->InputFilesValidationTimestamp = 0;
 
-  if (NewModule->Kind == MK_ImplicitModule) {
-    std::string TimestampFilename =
-        ModuleFile::getTimestampFilename(NewModule->FileName);
-    llvm::vfs::Status Status;
-    // A cached stat value would be fine as well.
-    if (!FileMgr.getNoncachedStatValue(TimestampFilename, Status))
-      NewModule->InputFilesValidationTimestamp =
-          llvm::sys::toTimeT(Status.getLastModificationTime());
-  }
+  if (NewModule->Kind == MK_ImplicitModule)
+    NewModule->InputFilesValidationTimestamp =
+        ModCache->getModuleTimestamp(NewModule->FileName);
 
   // Load the contents of the module
   if (std::unique_ptr<llvm::MemoryBuffer> Buffer = lookupBuffer(FileName)) {
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningService.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningService.cpp
index 96fe40c079c65..7f40c99f07287 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningService.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningService.cpp
@@ -14,6 +14,8 @@ using namespace dependencies;
 
 DependencyScanningService::DependencyScanningService(
     ScanningMode Mode, ScanningOutputFormat Format,
-    ScanningOptimizations OptimizeArgs, bool EagerLoadModules, bool TraceVFS)
+    ScanningOptimizations OptimizeArgs, bool EagerLoadModules, bool TraceVFS,
+    std::time_t BuildSessionTimestamp)
     : Mode(Mode), Format(Format), OptimizeArgs(OptimizeArgs),
-      EagerLoadModules(EagerLoadModules), TraceVFS(TraceVFS) {}
+      EagerLoadModules(EagerLoadModules), TraceVFS(TraceVFS),
+      BuildSessionTimestamp(BuildSessionTimestamp) {}
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
index 8e05a678fcdbc..95bd17a74be25 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
@@ -411,7 +411,7 @@ class DependencyScanningAction : public tooling::ToolAction {
     Scanned = true;
 
     // Create a compiler instance to handle the actual work.
-    auto ModCache = makeInProcessModuleCache(Service.getModuleCacheMutexes());
+    auto ModCache = makeInProcessModuleCache(Service.getModuleCacheEntries());
     ScanInstanceStorage.emplace(std::move(PCHContainerOps), ModCache.get());
     CompilerInstance &ScanInstance = *ScanInstanceStorage;
     ScanInstance.setInvocation(std::move(Invocation));
@@ -428,6 +428,10 @@ class DependencyScanningAction : public tooling::ToolAction {
     ScanInstance.getPreprocessorOpts().AllowPCHWithDifferentModulesCachePath =
         true;
 
+    if (ScanInstance.getHeaderSearchOpts().ModulesValidateOncePerBuildSession)
+      ScanInstance.getHeaderSearchOpts().BuildSessionTimestamp =
+          Service.getBuildSessionTimestamp();
+
     ScanInstance.getFrontendOpts().GenerateGlobalModuleIndex = false;
     ScanInstance.getFrontendOpts().UseGlobalModuleIndex = false;
     // This will prevent us compiling individual modules asynchronously since
diff --git a/clang/lib/Tooling/DependencyScanning/InProcessModuleCache.cpp b/clang/lib/Tooling/DependencyScanning/InProcessModuleCache.cpp
index 71ce4d098932b..f65d0ec6aad68 100644
--- a/clang/lib/Tooling/DependencyScanning/InProcessModuleCache.cpp
+++ b/clang/lib/Tooling/DependencyScanning/InProcessModuleCache.cpp
@@ -10,6 +10,7 @@
 
 #include "clang/Serialization/InMemoryModuleCache.h"
 #include "llvm/Support/AdvisoryLock.h"
+#include "llvm/Support/Chrono.h"
 
 #include <mutex>
 
@@ -50,7 +51,7 @@ class ReaderWriterLock : public llvm::AdvisoryLock {
 };
 
 class InProcessModuleCache : public ModuleCache {
-  ModuleCacheMutexes &Mutexes;
+  ModuleCacheEntries &Entries;
 
   // TODO: If we changed the InMemoryModuleCache API and relied on strict
   // context hash, we could probably create more efficient thread-safe
@@ -59,19 +60,46 @@ class InProcessModuleCache : public ModuleCache {
   InMemoryModuleCache InMemory;
 
 public:
-  InProcessModuleCache(ModuleCacheMutexes &Mutexes) : Mutexes(Mutexes) {}
+  InProcessModuleCache(ModuleCacheEntries &Entries) : Entries(Entries) {}
 
   void prepareForGetLock(StringRef Filename) override {}
 
   std::unique_ptr<llvm::AdvisoryLock> getLock(StringRef Filename) override {
-    auto &Mtx = [&]() -> std::shared_mutex & {
-      std::lock_guard<std::mutex> Lock(Mutexes.Mutex);
-      auto &Mutex = Mutexes.Map[Filename];
-      if (!Mutex)
-        Mutex = std::make_unique<std::shared_mutex>();
-      return *Mutex;
+    auto &CompilationMutex = [&]() -> std::shared_mutex & {
+      std::lock_guard Lock(Entries.Mutex);
+      auto &Entry = Entries.Map[Filename];
+      if (!Entry)
+        Entry = std::make_unique<ModuleCacheEntry>();
+      return Entry->CompilationMutex;
     }();
-    return std::make_unique<ReaderWriterLock>(Mtx);
+    return std::make_unique<ReaderWriterLock>(CompilationMutex);
+  }
+
+  std::time_t getModuleTimestamp(StringRef Filename) override {
+    auto &Timestamp = [&]() -> std::atomic<std::time_t> & {
+      std::lock_guard Lock(Entries.Mutex);
+      auto &Entry = Entries.Map[Filename];
+      if (!Entry)
+        Entry = std::make_unique<ModuleCacheEntry>();
+      return Entry->Timestamp;
+    }();
+
+    return Timestamp.load();
+  }
+
+  void updateModuleTimestamp(StringRef Filename) override {
+    // Note: This essentially replaces FS contention with mutex contention.
+    auto &Timestamp = [&]() -> std::atomic<std::time_t> & {
+      std::lock_guard Lock(Entries.Mutex);
+      auto &Entry = Entries.Map[Filename];
+      if (!Entry)
+        Entry = std::make_unique<ModuleCacheEntry>();
+      return Entry->Timestamp;
+    }();
+
+    std::time_t Expected = 0;
+    std::time_t Now = llvm::sys::toTimeT(std::chrono::system_clock::now());
+    Timestamp.compare_exchange_weak(Expected, Now);
   }
 
   InMemoryModuleCache &getInMemoryModuleCache() override { return InMemory; }
@@ -82,6 +110,6 @@ class InProcessModuleCache : public ModuleCache {
 } // namespace
 
 IntrusiveRefCntPtr<ModuleCache>
-dependencies::makeInProcessModuleCache(ModuleCacheMutexes &Mutexes) {
-  return llvm::makeIntrusiveRefCnt<InProcessModuleCache>(Mutexes);
+dependencies::makeInProcessModuleCache(ModuleCacheEntries &Entries) {
+  return llvm::makeIntrusiveRefCnt<InProcessModuleCache>(Entries);
 }

@jansvoboda11 jansvoboda11 requested a review from Bigcheese April 29, 2025 16:11
Copy link
Contributor

@Bigcheese Bigcheese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants