Skip to content

Reland: "[Exegesis] Add the ability to dry-run the measurement phase (#121991)" #122775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mshockwave
Copy link
Member

@mshockwave mshockwave commented Jan 13, 2025

This relands f8f8598

Follow up on #122371:
The problem here is a little subtle: when we dry-run the measurement phase, we create a LLJIT instance without actually executing the snippets. The key is, LLJIT has its own TargetMachine which uses triple designated by LLVM_TARGET_ARCH (which is default to host). On a machine that does not support Exegesis, the LLJIT would fail to create its TargetMachine because llvm-exegesis don't even register the host's target!

Putting this test into any of the target-specific folder won't help, because it's about the host. And personally I don't really want to use exegesis-can-execute-<arch> for generic tests like this -- it's too strict as we don't actually need to execute the snippet.

My solution here is creating another test feature which is added only when LLVM_TARGET_ARCH is supported by llvm-exegesis. This feature is something in between <arch>-registered-target and exegesis-can-execute-<arch>.

With the new benchmark phase, dry-run-measurement, llvm-exegesis can run
everything except the actual snippet execution. It is useful when we
want to test some parts of the code between the assemble-measured-code
and measure phase without actually running on native platforms.
@llvmbot
Copy link
Member

llvmbot commented Jan 13, 2025

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-tools-llvm-exegesis

Author: Min-Yih Hsu (mshockwave)

Changes

This relands f8f8598

Follow up on #122371:
The problem here is a little subtle: when we dry-run the measurement phase, we create a LLJIT instance without actually executing the snippets. The key is, LLJIT has its own TargetMachine which uses triple designated by LLVM_TARGET_ARCH (which is default to host). On a machine that does not support Exegesis, the LLJIT would fail to create its TargetMachine because llvm-exegesis don't even register the host's target!

Putting this test into any of the target-specific folder won't help, because it's about the host. And personally I don't really want to use exegesis-can-execute-&lt;arch&gt; for generic tests like this -- it's too strict as we don't actually need to execute the snippet.

My solution here is creating another test feature which is added only when LLVM_TARGET_ARCH is supported by llvm-exegesis. This feature is something in between &lt;arch&gt;-registered-target and exegesis-can-execute-&lt;arch&gt;.


Full diff: https://github.com/llvm/llvm-project/pull/122775.diff

7 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-exegesis.rst (+1)
  • (added) llvm/test/tools/llvm-exegesis/dry-run-measurement.test (+11)
  • (modified) llvm/test/tools/llvm-exegesis/lit.local.cfg (+6)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkResult.h (+1)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp (+25-8)
  • (modified) llvm/tools/llvm-exegesis/lib/Target.cpp (+2-2)
  • (modified) llvm/tools/llvm-exegesis/llvm-exegesis.cpp (+6-3)
diff --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst
index 8266d891a5e6b1..d357c2ceea4189 100644
--- a/llvm/docs/CommandGuide/llvm-exegesis.rst
+++ b/llvm/docs/CommandGuide/llvm-exegesis.rst
@@ -301,6 +301,7 @@ OPTIONS
   * ``prepare-and-assemble-snippet``: Same as ``prepare-snippet``, but also dumps an excerpt of the sequence (hex encoded).
   * ``assemble-measured-code``: Same as ``prepare-and-assemble-snippet``. but also creates the full sequence that can be dumped to a file using ``--dump-object-to-disk``.
   * ``measure``: Same as ``assemble-measured-code``, but also runs the measurement.
+  * ``dry-run-measurement``: Same as measure, but does not actually execute the snippet.
 
 .. option:: --x86-lbr-sample-period=<nBranches/sample>
 
diff --git a/llvm/test/tools/llvm-exegesis/dry-run-measurement.test b/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
new file mode 100644
index 00000000000000..02e1ec521cf276
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/dry-run-measurement.test
@@ -0,0 +1,11 @@
+# RUN: llvm-exegesis --mtriple=riscv64 --mcpu=sifive-p470 --mode=latency --opcode-name=ADD --use-dummy-perf-counters --benchmark-phase=dry-run-measurement | FileCheck %s
+# REQUIRES: riscv-registered-target && native-registered-exegesis-target
+
+# This test makes sure that llvm-exegesis doesn't execute "cross-compiled" snippets in the presence of
+# --dry-run-measurement. RISC-V was chosen simply because most of the time we run tests on X86 machines.
+
+# Should not contain misleading results.
+# CHECK: measurements:    []
+
+# Should not contain error messages like "snippet crashed while running: Segmentation fault".
+# CHECK: error:           ''
diff --git a/llvm/test/tools/llvm-exegesis/lit.local.cfg b/llvm/test/tools/llvm-exegesis/lit.local.cfg
index a51a2d73442fa0..343f34c58673ec 100644
--- a/llvm/test/tools/llvm-exegesis/lit.local.cfg
+++ b/llvm/test/tools/llvm-exegesis/lit.local.cfg
@@ -30,6 +30,12 @@ def can_use_perf_counters(mode, extra_options=[]):
         print("could not exec llvm-exegesis")
         return False
 
+# LLJIT builds its own TargetMachine using arch designated by LLVM_TARGET_ARCH, which
+# is default to host. We don't want tests that use LLJIT (but not necessarily
+# execute the snippets) to run on machines that are not even supported by
+# exegesis.
+if config.root.native_target in ["AArch64", "Mips", "PowerPC", "RISCV", "X86"]:
+    config.available_features.add("native-registered-exegesis-target")
 
 for arch in ["aarch64", "mips", "powerpc", "x86_64"]:
     if can_execute_generated_snippets(arch):
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
index 3c09a8380146e5..5480d856168784 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
@@ -38,6 +38,7 @@ enum class BenchmarkPhaseSelectorE {
   PrepareAndAssembleSnippet,
   AssembleMeasuredCode,
   Measure,
+  DryRunMeasure,
 };
 
 enum class BenchmarkFilter { All, RegOnly, WithMem };
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
index a7771b99e97b1a..cc46f7feb6cf7f 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
@@ -99,7 +99,7 @@ class InProcessFunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
   static Expected<std::unique_ptr<InProcessFunctionExecutorImpl>>
   create(const LLVMState &State, object::OwningBinary<object::ObjectFile> Obj,
          BenchmarkRunner::ScratchSpace *Scratch,
-         std::optional<int> BenchmarkProcessCPU) {
+         std::optional<int> BenchmarkProcessCPU, bool DryRun) {
     Expected<ExecutableFunction> EF =
         ExecutableFunction::create(State.createTargetMachine(), std::move(Obj));
 
@@ -107,14 +107,17 @@ class InProcessFunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
       return EF.takeError();
 
     return std::unique_ptr<InProcessFunctionExecutorImpl>(
-        new InProcessFunctionExecutorImpl(State, std::move(*EF), Scratch));
+        new InProcessFunctionExecutorImpl(State, std::move(*EF), Scratch,
+                                          DryRun));
   }
 
 private:
   InProcessFunctionExecutorImpl(const LLVMState &State,
                                 ExecutableFunction Function,
-                                BenchmarkRunner::ScratchSpace *Scratch)
-      : State(State), Function(std::move(Function)), Scratch(Scratch) {}
+                                BenchmarkRunner::ScratchSpace *Scratch,
+                                bool DryRun)
+      : State(State), Function(std::move(Function)), Scratch(Scratch),
+        DryRun(DryRun) {}
 
   static void accumulateCounterValues(const SmallVector<int64_t, 4> &NewValues,
                                       SmallVector<int64_t, 4> *Result) {
@@ -143,9 +146,14 @@ class InProcessFunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
       CrashRecoveryContext CRC;
       CrashRecoveryContext::Enable();
       const bool Crashed = !CRC.RunSafely([this, Counter, ScratchPtr]() {
-        Counter->start();
-        this->Function(ScratchPtr);
-        Counter->stop();
+        if (DryRun) {
+          Counter->start();
+          Counter->stop();
+        } else {
+          Counter->start();
+          this->Function(ScratchPtr);
+          Counter->stop();
+        }
       });
       CrashRecoveryContext::Disable();
       PS.reset();
@@ -177,6 +185,7 @@ class InProcessFunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
   const LLVMState &State;
   const ExecutableFunction Function;
   BenchmarkRunner::ScratchSpace *const Scratch;
+  bool DryRun = false;
 };
 
 #ifdef __linux__
@@ -664,6 +673,9 @@ Expected<std::unique_ptr<BenchmarkRunner::FunctionExecutor>>
 BenchmarkRunner::createFunctionExecutor(
     object::OwningBinary<object::ObjectFile> ObjectFile,
     const BenchmarkKey &Key, std::optional<int> BenchmarkProcessCPU) const {
+  bool DryRun =
+      BenchmarkPhaseSelector == BenchmarkPhaseSelectorE::DryRunMeasure;
+
   switch (ExecutionMode) {
   case ExecutionModeE::InProcess: {
     if (BenchmarkProcessCPU.has_value())
@@ -671,7 +683,8 @@ BenchmarkRunner::createFunctionExecutor(
                                  "support benchmark core pinning.");
 
     auto InProcessExecutorOrErr = InProcessFunctionExecutorImpl::create(
-        State, std::move(ObjectFile), Scratch.get(), BenchmarkProcessCPU);
+        State, std::move(ObjectFile), Scratch.get(), BenchmarkProcessCPU,
+        DryRun);
     if (!InProcessExecutorOrErr)
       return InProcessExecutorOrErr.takeError();
 
@@ -679,6 +692,10 @@ BenchmarkRunner::createFunctionExecutor(
   }
   case ExecutionModeE::SubProcess: {
 #ifdef __linux__
+    if (DryRun)
+      return make_error<Failure>("The subprocess execution mode cannot "
+                                 "dry-run measurement at this moment.");
+
     auto SubProcessExecutorOrErr = SubProcessFunctionExecutorImpl::create(
         State, std::move(ObjectFile), Key, BenchmarkProcessCPU);
     if (!SubProcessExecutorOrErr)
diff --git a/llvm/tools/llvm-exegesis/lib/Target.cpp b/llvm/tools/llvm-exegesis/lib/Target.cpp
index 29e58692f0e92b..e2251ff978888b 100644
--- a/llvm/tools/llvm-exegesis/lib/Target.cpp
+++ b/llvm/tools/llvm-exegesis/lib/Target.cpp
@@ -98,7 +98,7 @@ ExegesisTarget::createBenchmarkRunner(
     return nullptr;
   case Benchmark::Latency:
   case Benchmark::InverseThroughput:
-    if (BenchmarkPhaseSelector == BenchmarkPhaseSelectorE::Measure &&
+    if (BenchmarkPhaseSelector >= BenchmarkPhaseSelectorE::Measure &&
         !PfmCounters.CycleCounter) {
       const char *ModeName = Mode == Benchmark::Latency
                                  ? "latency"
@@ -116,7 +116,7 @@ ExegesisTarget::createBenchmarkRunner(
         State, Mode, BenchmarkPhaseSelector, ResultAggMode, ExecutionMode,
         ValidationCounters, BenchmarkRepeatCount);
   case Benchmark::Uops:
-    if (BenchmarkPhaseSelector == BenchmarkPhaseSelectorE::Measure &&
+    if (BenchmarkPhaseSelector >= BenchmarkPhaseSelectorE::Measure &&
         !PfmCounters.UopsCounter && !PfmCounters.IssueCounters)
       return make_error<Failure>(
           "can't run 'uops' mode, sched model does not define uops or issue "
diff --git a/llvm/tools/llvm-exegesis/llvm-exegesis.cpp b/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
index fa37e05956be8c..07bd44ee64f1f2 100644
--- a/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
+++ b/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
@@ -132,7 +132,10 @@ static cl::opt<BenchmarkPhaseSelectorE> BenchmarkPhaseSelector(
         clEnumValN(
             BenchmarkPhaseSelectorE::Measure, "measure",
             "Same as prepare-measured-code, but also runs the measurement "
-            "(default)")),
+            "(default)"),
+        clEnumValN(
+            BenchmarkPhaseSelectorE::DryRunMeasure, "dry-run-measurement",
+            "Same as measure, but does not actually execute the snippet")),
     cl::init(BenchmarkPhaseSelectorE::Measure));
 
 static cl::opt<bool>
@@ -476,7 +479,7 @@ static void runBenchmarkConfigurations(
 }
 
 void benchmarkMain() {
-  if (BenchmarkPhaseSelector == BenchmarkPhaseSelectorE::Measure &&
+  if (BenchmarkPhaseSelector >= BenchmarkPhaseSelectorE::Measure &&
       !UseDummyPerfCounters) {
 #ifndef HAVE_LIBPFM
     ExitWithError(
@@ -501,7 +504,7 @@ void benchmarkMain() {
 
   // Preliminary check to ensure features needed for requested
   // benchmark mode are present on target CPU and/or OS.
-  if (BenchmarkPhaseSelector == BenchmarkPhaseSelectorE::Measure)
+  if (BenchmarkPhaseSelector >= BenchmarkPhaseSelectorE::Measure)
     ExitOnErr(State.getExegesisTarget().checkFeatureSupport());
 
   if (ExecutionMode == BenchmarkRunner::ExecutionModeE::SubProcess &&

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mshockwave mshockwave merged commit a39aaf3 into llvm:main Jan 13, 2025
10 of 11 checks passed
@mshockwave mshockwave deleted the patch/exegesis/fix-dry-run-measurement branch January 13, 2025 21:43
@nico
Copy link
Contributor

nico commented Jan 14, 2025

This seems to break tests on macOS: http://45.33.8.238/macm1/98920/step_10.txt

Please take a look and revert for now if it takes a while to fix.

@nico
Copy link
Contributor

nico commented Jan 14, 2025

@nico
Copy link
Contributor

nico commented Jan 14, 2025

Also on green dragon, which I think are LLVM's official mac bots: https://green.lab.llvm.org/job/llvm.org/view/All/job/clang-stage1-cmake-RA-incremental/lastCompletedBuild/testReport/

@Prabhuk
Copy link
Contributor

Prabhuk commented Jan 14, 2025

We are seeing MAC test failures as well in Fuchsia toolchain builders:

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-host-mac-x64/b8725747943317383073/overview

mshockwave added a commit that referenced this pull request Jan 14, 2025
I'm making this test to run only when exegesis-can-execute-x86_64.
@mshockwave
Copy link
Member Author

We are seeing MAC test failures as well in Fuchsia toolchain builders:

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-host-mac-x64/b8725747943317383073/overview

It's hot fixed by 63d3bd6
I simply moved this test into test/tools/llvm-exegesis/X86 and gated with exegesis-can-execute-x86_64.

@mshockwave
Copy link
Member Author

This seems to break tests on macOS: http://45.33.8.238/macm1/98920/step_10.txt

Please take a look and revert for now if it takes a while to fix.

The root cause of this problem was, again, relatively subtle: The RISC-V snippet object was generated by a MC assembler separated from the LLJIT engine, which was configured using the host triple. Therefore the host -- macosx in these cases -- would expect symbols to be prefixed by underscore, while RISC-V doesn't use that convention in any way at this moment.

@dyung
Copy link
Collaborator

dyung commented Jan 14, 2025

The relanded test seems to be failing on x86-64 MacOS: https://lab.llvm.org/buildbot/#/builders/23/builds/6632

mshockwave added a commit that referenced this pull request Jan 14, 2025
…t phase (#121991)" (#122775)"

This reverts commit a39aaf3 and
63d3bd6.

Due to test failures on MacOSX.
@mshockwave
Copy link
Member Author

The relanded test seems to be failing on x86-64 MacOS: https://lab.llvm.org/buildbot/#/builders/23/builds/6632

Yeah, I just reverted both this PR and the hot fix. I'm gonna dig into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants