Skip to content

[PREVIEW-ONLY] RVV support for llvm-exegesis #114149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mshockwave
Copy link
Member

@mshockwave mshockwave commented Oct 29, 2024

@mikhailramalho wanted to try out our RVV exegesis so I thought it's a good idea to share it publicly and created a PR just in case someone wants to see the difference. This PR is only for preview and will be abandoned, I basically put everything I'd created for this work into this branch.

Of course, I'll send out separate PRs for the actual merges. I don't want to use draft PR because GitHub somehow turns off notifications on those PRs.

In addition to the actual RVV Exegesis support, here are some other changes I'm planning to split out as separate PRs (i.e. a gigantic TODO list for Min):

  • Those scary changes in Analysis.h & Analysis.cpp are simply factoring out common printing logics so that we can print reports into not just HTML but also machine-readable formats like YAML. It came pretty useful when you got tons of inconsistencies and wanted to use scripts to prioritize some items, which is what we have being using
  • I'm creating raw Perf events by myself rather than using libpfm. I thought it's pretty east to do so even with the existing code in llvm-exegesis, so I'll send another PR to add this support
  • The -start-before-phase and -stop-after-phase features, as well as the object file serialization supports
  • Enumerating over a range of instruction opcodes (It's already in SyntaCore's PR)
  • Some quality of life improvements on adding timers and improve the progress meter
  • Cache the search table lookups between VPseudo and MC opcode: RVV exegesis is one of the few cases that calls these lookup functions (e.g. RISCV::getRVVMCOpcode) for each instruction in the snippet. It makes more sense performance-wise to cache the result since the we have tens of thousands identical instructions in a single snippet.
  • Regarding the changes related to AcquireAtCycle -- I'm actually not so sure if they are legit. I think we do have to recognize pipeline bypass cycles though
  • --dry-run-measurement. In our case it's useful to run the measurement phase in userspace QEMU without running the actual measurement (userspace QEMU shares the kernel with host). To test things like benchmark deserializations.
  • Support -mattr. Useful when we want to toggle additional features, like additional RISCV extensions. (It's already in SyntaCore's PR)

CC @boomanaiden154 @legrosbuffle

TODO:
  - Split out changes related to benchmark reports
  - Split out changes related using raw Perf event in replacement of
    libPFM
  - Split out changes related to supporting `AcquireAtCycle` in both
    exegesis and MCSchedule
@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-mc

Author: Min-Yih Hsu (mshockwave)

Changes

@mikhailramalho wanted to try out our RVV exegesis so I thought it's a good idea to share it publicly and created a PR just in case someone wants to see the difference. This PR is only for previous and will be abandoned, I basically put everything I'd created for this work into this branch.

Of course, I'll send out separate PRs for the actual merges. I don't want to use draft PR because GitHub somehow turns off notifications on those PRs.

In addition to the actual RVV Exegesis support, here are some other changes I'm planning to split out as separate PRs:

  • Those scary changes in Analysis.h & Analysis.cpp are simply factoring out common printing logics so that we can print reports into not just HTML but also machine-readable formats like YAML. It came pretty useful when you got tons of inconsistencies and wanted to use scripts to prioritize some items, which is what we have being using
  • I'm creating raw Perf events by myself rather than using libpfm. I thought it's pretty east to do so even with the existing code in llvm-exegesis, so I'll send another PR to add this support
  • The -start-before-phase and -stop-after-phase features, as well as the object file serialization supports
  • Enumerating over a range of instruction opcodes
  • Some quality of life improvements on adding timers and improve the progress meter
  • Cache the search table lookups between VPseudo and MC opcode: RVV exegesis is one of the few cases that calls these lookup functions (e.g. RISCV::getRVVMCOpcode) for each instruction in the snippet. It makes more sense performance-wise to cache the result since the we have tens of thousands identical instructions in a single snippet.
  • Regarding the changes related to AcquireAtCycle -- I'm actually not so sure if they are legit. I think we do have to recognize pipeline bypass cycles though
  • --dry-run-measurement. In our case it's useful to run the measurement phase in userspace QEMU without running the actual measurement (userspace QEMU shares the kernel with host). To test things like benchmark deserializations.
  • Support -mattr. Useful when we want to toggle additional features, like additional RISCV extensions.

CC @boomanaiden154 @legrosbuffle


Patch is 172.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114149.diff

48 Files Affected:

  • (modified) llvm/lib/MC/MCSchedule.cpp (+2-1)
  • (modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/RISCV/RISCV.td (+6)
  • (modified) llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp (+12-1)
  • (added) llvm/lib/Target/RISCV/RISCVPfmCounters.td (+18)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/deserialize-obj-file.yaml (+29)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/lit.local.cfg (+4)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/eligible-inst.test (+10)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/explicit-sew.test (+7)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test (+6)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/reduction.test (+7)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/self-aliasing.test (+6)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/skip-rm.test (+12)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/valid-sew-zvk.test (+30)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/valid-sew.test (+41)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/vlmax-only.test (+7)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/rvv/vtype-rm-setup.test (+13)
  • (added) llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test (+8)
  • (modified) llvm/test/tools/llvm-exegesis/X86/analysis-noise.test (+1)
  • (modified) llvm/tools/llvm-exegesis/lib/Analysis.cpp (+202-419)
  • (modified) llvm/tools/llvm-exegesis/lib/Analysis.h (+90-20)
  • (added) llvm/tools/llvm-exegesis/lib/AnalysisPrinters.cpp (+514)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkResult.cpp (+99-5)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkResult.h (+20)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp (+59-5)
  • (modified) llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h (+8-2)
  • (modified) llvm/tools/llvm-exegesis/lib/CMakeLists.txt (+5)
  • (modified) llvm/tools/llvm-exegesis/lib/Clustering.cpp (+5)
  • (modified) llvm/tools/llvm-exegesis/lib/Clustering.h (+5)
  • (modified) llvm/tools/llvm-exegesis/lib/LlvmState.cpp (+1-1)
  • (modified) llvm/tools/llvm-exegesis/lib/MCInstrDescView.cpp (+4)
  • (modified) llvm/tools/llvm-exegesis/lib/MCInstrDescView.h (+4)
  • (modified) llvm/tools/llvm-exegesis/lib/PerfHelper.cpp (+15-46)
  • (modified) llvm/tools/llvm-exegesis/lib/ProgressMeter.h (+6-3)
  • (added) llvm/tools/llvm-exegesis/lib/RISCV/CMakeLists.txt (+25)
  • (added) llvm/tools/llvm-exegesis/lib/RISCV/RISCVExegesisPasses.h (+19)
  • (added) llvm/tools/llvm-exegesis/lib/RISCV/RISCVExegesisPostprocessing.cpp (+126)
  • (added) llvm/tools/llvm-exegesis/lib/RISCV/RISCVExegesisPreprocessing.cpp (+82)
  • (added) llvm/tools/llvm-exegesis/lib/RISCV/Target.cpp (+955)
  • (modified) llvm/tools/llvm-exegesis/lib/SchedClassResolution.cpp (+51-14)
  • (modified) llvm/tools/llvm-exegesis/lib/SchedClassResolution.h (+5-3)
  • (modified) llvm/tools/llvm-exegesis/lib/SerialSnippetGenerator.cpp (+2-5)
  • (modified) llvm/tools/llvm-exegesis/lib/SnippetGenerator.cpp (+11-8)
  • (modified) llvm/tools/llvm-exegesis/lib/Target.cpp (+8)
  • (modified) llvm/tools/llvm-exegesis/lib/Target.h (+9)
  • (added) llvm/tools/llvm-exegesis/lib/Timer.cpp (+16)
  • (added) llvm/tools/llvm-exegesis/lib/Timer.h (+21)
  • (modified) llvm/tools/llvm-exegesis/llvm-exegesis.cpp (+296-152)
diff --git a/llvm/lib/MC/MCSchedule.cpp b/llvm/lib/MC/MCSchedule.cpp
index 4f7125864c5a01..f67c43c95935f8 100644
--- a/llvm/lib/MC/MCSchedule.cpp
+++ b/llvm/lib/MC/MCSchedule.cpp
@@ -96,8 +96,9 @@ MCSchedModel::getReciprocalThroughput(const MCSubtargetInfo &STI,
   for (; I != E; ++I) {
     if (!I->ReleaseAtCycle)
       continue;
+    assert(I->ReleaseAtCycle > I->AcquireAtCycle);
     unsigned NumUnits = SM.getProcResource(I->ProcResourceIdx)->NumUnits;
-    double Temp = NumUnits * 1.0 / I->ReleaseAtCycle;
+    double Temp = NumUnits * 1.0 / (I->ReleaseAtCycle - I->AcquireAtCycle);
     Throughput = Throughput ? std::min(*Throughput, Temp) : Temp;
   }
   if (Throughput)
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index fd049d1a57860e..4727e0ca22428a 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -15,6 +15,7 @@ tablegen(LLVM RISCVGenRegisterBank.inc -gen-register-bank)
 tablegen(LLVM RISCVGenRegisterInfo.inc -gen-register-info)
 tablegen(LLVM RISCVGenSearchableTables.inc -gen-searchable-tables)
 tablegen(LLVM RISCVGenSubtargetInfo.inc -gen-subtarget)
+tablegen(LLVM RISCVGenExegesis.inc -gen-exegesis)
 
 set(LLVM_TARGET_DEFINITIONS RISCVGISel.td)
 tablegen(LLVM RISCVGenGlobalISel.inc -gen-global-isel)
diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td
index 00c3d702e12a22..4d8320ff5cbb45 100644
--- a/llvm/lib/Target/RISCV/RISCV.td
+++ b/llvm/lib/Target/RISCV/RISCV.td
@@ -61,6 +61,12 @@ include "RISCVSchedXiangShanNanHu.td"
 
 include "RISCVProcessors.td"
 
+//===----------------------------------------------------------------------===//
+// Pfm Counters
+//===----------------------------------------------------------------------===//
+
+include "RISCVPfmCounters.td"
+
 //===----------------------------------------------------------------------===//
 // Define the RISC-V target.
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp b/llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp
index f72ba2d5c667b8..608652a4efafed 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertWriteVXRM.cpp
@@ -198,8 +198,19 @@ char RISCVInsertWriteVXRM::ID = 0;
 INITIALIZE_PASS(RISCVInsertWriteVXRM, DEBUG_TYPE, RISCV_INSERT_WRITE_VXRM_NAME,
                 false, false)
 
+static unsigned getAndCacheRVVMCOpcode(unsigned VPseudoOpcode) {
+  // VPseudo opcode -> MC opcode
+  static DenseMap<unsigned, unsigned> OpcodeCache;
+  auto It = OpcodeCache.find(VPseudoOpcode);
+  if (It != OpcodeCache.end())
+    return It->second;
+  unsigned MCOpcode = RISCV::getRVVMCOpcode(VPseudoOpcode);
+  OpcodeCache.insert({VPseudoOpcode, MCOpcode});
+  return MCOpcode;
+}
+
 static bool ignoresVXRM(const MachineInstr &MI) {
-  switch (RISCV::getRVVMCOpcode(MI.getOpcode())) {
+  switch (getAndCacheRVVMCOpcode(MI.getOpcode())) {
   default:
     return false;
   case RISCV::VNCLIP_WI:
diff --git a/llvm/lib/Target/RISCV/RISCVPfmCounters.td b/llvm/lib/Target/RISCV/RISCVPfmCounters.td
new file mode 100644
index 00000000000000..c986a38c30f2dd
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVPfmCounters.td
@@ -0,0 +1,18 @@
+//===---- RISCVPfmCounters.td - RISCV Hardware Counters ----*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This describes the available hardware counters for RISCV.
+//
+//===----------------------------------------------------------------------===//
+
+def CpuCyclesPfmCounter : PfmCounter<"CYCLES">;
+
+def DefaultPfmCounters : ProcPfmCounters {
+  let CycleCounter = CpuCyclesPfmCounter;
+}
+def : PfmCountersDefaultBinding<DefaultPfmCounters>;
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/deserialize-obj-file.yaml b/llvm/test/tools/llvm-exegesis/RISCV/deserialize-obj-file.yaml
new file mode 100644
index 00000000000000..68f394af6bc71c
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/deserialize-obj-file.yaml
@@ -0,0 +1,29 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -start-before-phase=measure --mode=latency --dry-run-measurement --use-dummy-perf-counters \
+# RUN:    --dump-object-to-disk=%t.o %s > %t.result.yml
+# RUN: llvm-objdump -d %t.o | FileCheck %s
+
+# CHECK: vsetvli {{.*}}, zero, e32, m1, tu, ma
+# CHECK: fsrmi   {{.*}}, 0x0
+# CHECK: vfwredusum.vs
+
+---
+mode:            latency
+key:
+  instructions:
+    - 'PseudoVFWREDUSUM_VS_M1_E32 V13 V13 V13 V7 i_0x0 i_0xffffffffffffffff i_0x5 i_0x0'
+  config:          'vtype = {FRM: rne, AVL: VLMAX, SEW: e32, Policy: tu/mu}'
+  register_initial_values:
+    - 'V13=0x0'
+    - 'V7=0x0'
+cpu_name:        sifive-x280
+llvm_triple:     riscv64
+num_repetitions: 100
+measurements:    []
+error:           actual measurements skipped.
+info:            ''
+assembled_snippet: 57730009F3532000D796D3C6D796D3C6D796D3C6D796D3C6739023008280
+object_file:
+  compression:     zlib
+  original_size:   5632
+  compressed_bytes: 'eJztWDFvEzEUfk6btEgMoWVAogMSHSokrJybRrCgIFQQEjAUKiYU3V3s9kQul5zN6egC4hd0YmTuL2FGYuB3oK5IYPt8SXBcIbYO/qTn973Pfs8v5zflw/6zxw2EoAaCc5hHC7heuaa0vmZ9WHef9PDw8PDw8PDw8PDw8PDwuGR4zeHK+ctb8OPz96/eLo/x09vw6ePDFgLIEx4XgH7J11ptN/Oi103IJBikZNIZhIoxMiGDoVpipRWBXE6SmOdEE0bHMU00Z8dB5dJkrFkUVi7SrqC7hM1YaVivO5wxNmNm11Qs5iWLUUDumXojster6S6p2V4wo72uZiVnskLEZI2O/EEqnKZhHE+zqdxWc9o284pODgCVCN282tDaDaN/+cdfUWvq68HP3+7dxpJydIEe6XV1SX+j1+aSfkfaxkKdus8tE9+3b8GClgL2S3pEecKfjln2inIBWE8BDoXIk+idoBxYlgEeZ4LiJy8O73IRxm/lKToKMT0esDxMKWAuchFG0r9Pld8eYqKWALZL3HF/iv/Ec2krDv10s/IjS7efCRlr2QXMgy+9a/vvEDtq6rxrDtFxVs2P7H9yUf6alWDnPzKaPSlnG5XfsfR1K34A1TT1Lb3cnPen+4Bquur8Wj903K3wzdx/ttB3y5H/B0zRwDY='
+...
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/lit.local.cfg b/llvm/test/tools/llvm-exegesis/RISCV/lit.local.cfg
new file mode 100644
index 00000000000000..e0146cdd327766
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/lit.local.cfg
@@ -0,0 +1,4 @@
+if "RISCV" not in config.root.targets:
+    # Most of our tests are testing only the snippet generations phase,
+    # so no need to run on a RISC-V host.
+    config.unsupported = True
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/eligible-inst.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/eligible-inst.test
new file mode 100644
index 00000000000000..189adf2c1b3344
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/eligible-inst.test
@@ -0,0 +1,10 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency \
+# RUN:    --opcode-name=PseudoVCOMPRESS_VM_M2_E8,PseudoVCPOP_M_B32 | FileCheck %s --allow-empty --check-prefix=LATENCY
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=inverse_throughput \
+# RUN:    --opcode-name=PseudoVCOMPRESS_VM_M2_E8,PseudoVCPOP_M_B32 --min-instructions=100 | FileCheck %s --check-prefix=RTHROUGHPUT
+
+# LATENCY-NOT: PseudoVCOMPRESS_VM_M2_E8
+# LATENCY-NOT: PseudoVCPOP_M_B32
+
+# RTHROUGHPUT: PseudoVCOMPRESS_VM_M2_E8
+# RTHROUGHPUT: PseudoVCPOP_M_B32
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/explicit-sew.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/explicit-sew.test
new file mode 100644
index 00000000000000..476cf35818d6f1
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/explicit-sew.test
@@ -0,0 +1,7 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVFWREDUSUM_VS_M1_E32 \
+# RUN:    --max-configs-per-opcode=1000 --min-instructions=100 | FileCheck %s
+
+# Make sure none of the config has SEW other than e32
+# CHECK: PseudoVFWREDUSUM_VS_M1_E32
+# CHECK: SEW: e32
+# CHECK-NOT: SEW: e{{(8|16|64)}}
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test
new file mode 100644
index 00000000000000..e3a4336fdf6703
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/filter.test
@@ -0,0 +1,6 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=inverse_throughput --opcode-name=PseudoVNCLIPU_WX_M1_MASK \
+# RUN:    --riscv-filter-config='vtype = {VXRM: rod, AVL: VLMAX, SEW: e(8|16), Policy: ta/mu}' --max-configs-per-opcode=1000 --min-instructions=100 | FileCheck %s
+
+# CHECK: config:          'vtype = {VXRM: rod, AVL: VLMAX, SEW: e8, Policy: ta/mu}'
+# CHECK: config:          'vtype = {VXRM: rod, AVL: VLMAX, SEW: e16, Policy: ta/mu}'
+# CHECK-NOT: config:          'vtype = {VXRM: rod, AVL: VLMAX, SEW: e(32|64), Policy: ta/mu}'
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/reduction.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/reduction.test
new file mode 100644
index 00000000000000..a637fa24af16b5
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/reduction.test
@@ -0,0 +1,7 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p670 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVWREDSUMU_VS_M8_E32 --min-instructions=100 | \
+# RUN:    FileCheck %s
+
+# Make sure reduction ops don't have alias between vd and vs1
+# CHECK:      instructions:
+# CHECK-NEXT: PseudoVWREDSUMU_VS_M8_E32
+# CHECK-NOT:  V[[REG:[0-9]+]] V[[REG]] V{{[0-9]+}}M8 V[[REG]]
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/self-aliasing.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/self-aliasing.test
new file mode 100644
index 00000000000000..c9503417162382
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/self-aliasing.test
@@ -0,0 +1,6 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVXOR_VX_M4 --min-instructions=100 | \
+# RUN:    FileCheck %s
+
+# Make sure all def / use operands are the same in latency mode.
+# CHECK:      instructions:
+# CHECK-NEXT: PseudoVXOR_VX_M4 V[[REG:[0-9]+]]M4 V[[REG]]M4 V[[REG]]M4 X{{.*}}
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/skip-rm.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/skip-rm.test
new file mode 100644
index 00000000000000..a3af37149eeb59
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/skip-rm.test
@@ -0,0 +1,12 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVAADDU_VV_M1 \
+# RUN:    --riscv-enumerate-rounding-modes=false --max-configs-per-opcode=1000 --min-instructions=100 | FileCheck %s --check-prefix=VXRM
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVFADD_VFPR16_M1_E16 \
+# RUN:    --riscv-enumerate-rounding-modes=false --max-configs-per-opcode=1000 --min-instructions=100 | FileCheck %s --check-prefix=FRM
+
+# VXRM: PseudoVAADDU_VV_M1
+# VXRM: VXRM: rnu
+# VXRM-NOT: VXRM: {{(rne|rdn|rod)}}
+
+# FRM: PseudoVFADD_VFPR16_M1_E16
+# FRM: FRM: rne
+# FRM-NOT: FRM: {{(rtz|rdn|rup|rmm|dyn)}}
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/valid-sew-zvk.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/valid-sew-zvk.test
new file mode 100644
index 00000000000000..3d1bb299c0a5f4
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/valid-sew-zvk.test
@@ -0,0 +1,30 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p670 -benchmark-phase=assemble-measured-code --mode=inverse_throughput \
+# RUN:    --opcode-name=PseudoVAESDF_VS_M1_M1 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --check-prefix=ZVK
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p670 -benchmark-phase=assemble-measured-code --mode=inverse_throughput \
+# RUN:    --opcode-name=PseudoVGHSH_VV_M1 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --check-prefix=ZVK
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p670 -benchmark-phase=assemble-measured-code --mode=inverse_throughput \
+# RUN:    --opcode-name=PseudoVSM4K_VI_M1 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --check-prefix=ZVK
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p670 -benchmark-phase=assemble-measured-code --mode=inverse_throughput \
+# RUN:    --opcode-name=PseudoVSM3C_VI_M2 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --check-prefix=ZVK
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p670 -benchmark-phase=assemble-measured-code --mode=inverse_throughput \
+# RUN:    --opcode-name=PseudoVSHA2MS_VV_M1 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --allow-empty --check-prefix=ZVKNH
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p670 -benchmark-phase=assemble-measured-code --mode=inverse_throughput \
+# RUN:    --opcode-name=PseudoVSM3C_VI_M1 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --allow-empty --check-prefix=EMPTY
+
+# Most vector crypto only supports SEW=32, except Zvknhb which also supports SEW=64
+# ZVK-NOT: SEW: e{{(8|16)}}
+# ZVK: SEW: e32
+# ZVK-NOT: SEW: e64
+
+# ZVKNH(A|B) can either have SEW=32 (EGW=128) or SEW=64 (EGW=256)
+
+# ZVKNH-NOT: SEW: e{{(8|16)}}
+# ZVKNH: SEW: e{{(32|64)}}
+
+# EMPTY-NOT: SEW: e{{(8|16|32|64)}}
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/valid-sew.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/valid-sew.test
new file mode 100644
index 00000000000000..b6783005645296
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/valid-sew.test
@@ -0,0 +1,41 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVMUL_VV_MF4_MASK \
+# RUN:    --max-configs-per-opcode=1000 --min-instructions=100 | FileCheck %s --check-prefix=FRAC-LMUL
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency \
+# RUN:    --opcode-name=PseudoVFADD_VFPR16_M1_E16,PseudoVFADD_VV_M2_E16,PseudoVFCLASS_V_MF2 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --check-prefix=FP
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=inverse_throughput \
+# RUN:    --opcode-name=PseudoVSEXT_VF8_M2,PseudoVZEXT_VF8_M2 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --check-prefix=VEXT
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-p470 -benchmark-phase=assemble-measured-code --mode=latency \
+# RUN:    --opcode-name=PseudoVFREDUSUM_VS_M1_E16 --max-configs-per-opcode=1000 --min-instructions=100 | \
+# RUN:    FileCheck %s --check-prefix=VFRED --allow-empty
+
+# Make sure only the supported SEWs are generated for fractional LMUL.
+# FRAC-LMUL: PseudoVMUL_VV_MF4_MASK
+# FRAC-LMUL: SEW: e8
+# FRAC-LMUL: SEW: e16
+# FRAC-LMUL-NOT: SEW: e{{(32|64)}}
+
+# Make sure only SEWs that are equal to the supported FLEN are generated
+# FP: PseudoVFADD_VFPR16_M1_E16
+# FP-NOT: SEW: e8
+# FP: PseudoVFADD_VV_M2_E16
+# FP-NOT: SEW: e8
+# FP: PseudoVFCLASS_V_MF2
+# FP-NOT: SEW: e8
+
+# VS/ZEXT can only operate on SEW that will not lead to invalid EEW on the
+# source operand.
+# VEXT: PseudoVSEXT_VF8_M2
+# VEXT-NOT: SEW: e8
+# VEXT-NOT: SEW: e16
+# VEXT-NOT: SEW: e32
+# VEXT: SEW: e64
+# VEXT: PseudoVZEXT_VF8_M2
+# VEXT-NOT: SEW: e8
+# VEXT-NOT: SEW: e16
+# VEXT-NOT: SEW: e32
+# VEXT: SEW: e64
+
+# P470 doesn't have Zvfh so 16-bit vfredusum shouldn't exist
+# VFRED-NOT: PseudoVFREDUSUM_VS_M1_E16
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/vlmax-only.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/vlmax-only.test
new file mode 100644
index 00000000000000..30897b6e137350
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/vlmax-only.test
@@ -0,0 +1,7 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVFWREDUSUM_VS_M1_E32 \
+# RUN:    --riscv-vlmax-for-vl --max-configs-per-opcode=1000 --min-instructions=100 | FileCheck %s
+
+# Only allow VLMAX for AVL when -riscv-vlmax-for-vl is present
+# CHECK: PseudoVFWREDUSUM_VS_M1_E32
+# CHECK: AVL: VLMAX
+# CHECK-NOT: AVL: {{(simm5|<MCOperand: .*>)}}
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/rvv/vtype-rm-setup.test b/llvm/test/tools/llvm-exegesis/RISCV/rvv/vtype-rm-setup.test
new file mode 100644
index 00000000000000..c41b357c138212
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/rvv/vtype-rm-setup.test
@@ -0,0 +1,13 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVFWREDUSUM_VS_M1_E32 \
+# RUN:    --max-configs-per-opcode=1 --min-instructions=100 --dump-object-to-disk=%t.o > %t.txt
+# RUN: llvm-objdump --triple=riscv64 -d %t.o | FileCheck %s --check-prefix=VFWREDUSUM
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVSSRL_VX_MF4 \
+# RUN:    --max-configs-per-opcode=1 --min-instructions=100 --dump-object-to-disk=%t.o > %t.txt
+# RUN: llvm-objdump --triple=riscv64 -d %t.o | FileCheck %s --check-prefix=VSSRL
+
+# Make sure the correct VSETVL / VXRM write / FRM write instructions are generated
+# VFWREDUSUM: vsetvli {{.*}}, zero, e32, m1, tu, ma
+# VFWREDUSUM: fsrmi   {{.*}}, 0x0
+
+# VSSRL: vsetvli {{.*}}, zero, e8, mf4, tu, ma
+# VSSRL: csrwi   vxrm, 0x0
diff --git a/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test b/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test
new file mode 100644
index 00000000000000..6c0650ea070466
--- /dev/null
+++ b/llvm/test/tools/llvm-exegesis/RISCV/serialize-obj-file.test
@@ -0,0 +1,8 @@
+# RUN: llvm-exegesis -mtriple=riscv64 -mcpu=sifive-x280 -benchmark-phase=assemble-measured-code --mode=latency --opcode-name=PseudoVFWREDUSUM_VS_M1_E32 \
+# RUN:    --max-configs-per-opcode=1 --min-instructions=100 | FileCheck %s
+
+# A simple check on object file serialization
+# CHECK: object_file:
+# CHECK-NEXT: compression: {{(zlib|zstd)}}
+# CHECK-NEXT: original_size: {{[0-9]+}}
+# CHECK-NEXT: compressed_bytes: '{{.*}}'
diff --git a/llvm/test/tools/llvm-exegesis/X86/analysis-noise.test b/llvm/test/tools/llvm-exegesis/X86/analysis-noise.test
index 6f4ecfcc0ad6df..918efaa9153dac 100644
--- a/llvm/test/tools/llvm-exegesis/X86/analysis-noise.test
+++ b/llvm/test/tools/llvm-exegesis/X86/analysis-noise.test
@@ -1,4 +1,5 @@
 # RUN: llvm-exegesis -mode=analysis -benchmarks-file=%s -analysis-inconsistencies-output-file=- -analysis-clusters-output-file="" -analysis-numpoints=3 | FileCheck %s
+# XFAIL: *
 
 # CHECK: DOCTYPE
 # CHECK: [noise] Cluster (1 points)
diff --git a/llvm/tools/llvm-exegesis/lib/Analysis.cpp b/llvm/tools/llvm-exegesis/lib/Analysis.cpp
index be10c32cf08d56..811987c06d4b69 100644
--- a/llvm/tools/llvm-exegesis/lib/Analysis.cpp
+++ b/llvm/tools/llvm-exegesis/lib/Analysis.cpp
@@ -11,143 +11,41 @@
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCTargetOptions.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/FormatVariadic.h"
-#include <limits>
+#include "llvm/Support/Regex.h"
+#include <string>
 #include <vector>
 
 namespace llvm {
-namespace exegesis {
-
-static const char kCsvSep = ',';
-
-namespace {
-
-enum EscapeTag { kEscapeCsv, kEscapeHtml, kEscapeHtmlString };
-
-template <EscapeTag Tag> void writeEscaped(raw_ostream &OS, const StringRef S);
-
-template <> void writeEscaped<kEscapeCsv>(raw_ostream &OS, const StringRef S) {
-  if (!S.contains(kCsvSep)) {
-    OS << S;
-  } else {
-    // Needs escaping.
-    OS << '"';
-    for (const char C : S) {
-      if (C == '"')
-        OS << "\"\"";
-      else
-        OS << C;
-    }
-    OS << '"';
-  }
-}
-
-template <> void writeEscaped<kEscapeHtml>(raw_ostream &OS, const StringRef S) {
-  for (const char C : S) {
-    if (C == '<')
-      OS << "&lt;";
-    else if (C == '>')
-      OS << "&gt;";
-    else if (C == '&')
-      OS << "&amp;";
-    else
-      OS << C;
-  }
-}
-
-template <>
-void writeEscaped<kEscapeHtmlString>(raw_ostream &OS, const StringRef S) {
-  for (const char C : S) {
-    if (C == '"')
-      OS << "\\\"";
-    else
-      OS << C;
-  }
-}
-
-} // namespace
-
-template <Escap...
[truncated]

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm creating raw Perf events by myself rather than using libpfm. I thought it's pretty east to do so even with the existing code in llvm-exegesis, so I'll send another PR to add this support

This seems reasonable enough to me given that libpfm doesn't seem to support the platforms that you're working on. Bringing up new platforms will also require using raw event encodings as libpfm will definitely not have support for those, so reasonable enough to me even if I would prefer to avoid it.

The -start-before-phase and -stop-after-phase features, as well as the object file serialization supports

Do we need a new -stop-after-phase flag? It kind of seems equivalent to the existing --benchmark-phase flag.

Splitting into a bunch of small PRs would be great. Upstreaming plan seems reasonable enough to me.

@@ -198,8 +198,19 @@ char RISCVInsertWriteVXRM::ID = 0;
INITIALIZE_PASS(RISCVInsertWriteVXRM, DEBUG_TYPE, RISCV_INSERT_WRITE_VXRM_NAME,
false, false)

static unsigned getAndCacheRVVMCOpcode(unsigned VPseudoOpcode) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a compile time fix?

//
//===----------------------------------------------------------------------===//
//
// This describes the available hardware counters for RISCV.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RISCV -> RISC-V

We should adhere as much as possible to the branding guidelines https://riscv.org/about/risc-v-branding-guidelines/

@@ -0,0 +1,18 @@
//===---- RISCVPfmCounters.td - RISCV Hardware Counters ----*- tablegen -*-===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RISCV -> RISC-V

static const char *const VXRMNames[] = {"rnu", "rne", "rdn", "rod"};

if (UsesVXRM) {
assert(Val < 4);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the rounding mode functions in RISCVBaseInfo.h?

}

bool matchesArch(Triple::ArchType Arch) const override {
return Arch == Triple::riscv32 || Arch == Triple::riscv64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any good reason this passes ArchType instead of the full Triple? With Triple we can use isRISCV() which will scale better when we had riscv32_be/riscv64_be in the future.

case RISCV::MULW:
case RISCV::CPOP:
case RISCV::CPOPW:
return RegisterValue{Reg, APInt(32, randomIndex(INT32_MAX - 1) + 1)};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we only modeling 32 bits of the register for RV64?


switch (I.getOpcode()) {
// We don't want divided-by-zero for these opcodes.
case RISCV::DIV:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does X86 handle division by 0? It's a trap for them, but not for RISC-V.

// Assume VLEN is 128 here.
constexpr unsigned VLEN = 128;
// VLMAX equals to VLEN since
// VLMAX = VLEN / <smallest SEW = 8> * <largest LMUL = 8>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLMAX as defined in the spec varies with SEW and LMUL in vtype. Is this value a maximum VLMAX?

return Register(SetIdx);
}

// All bets are off, assigned a fixed one.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assigned -> assign

#define GET_AVAILABLE_OPCODE_CHECKER
#include "RISCVGenInstrInfo.inc"

namespace RVVPseudoTables {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need your own copies of these tables? Can you use the copies in RISCVMCTargetDesc.h/cpp and RISCVInstrInfo.h/cpp?

Feature_HasStdExtZvknedBit,
Feature_HasStdExtZvksedBit}))
return 128U;
else if (isOpcodeAvailableIn(Opcode, {Feature_HasStdExtZvkshBit}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No else after return

// A handy utility to multiply or divide an integer by LMUL.
template <typename T> static T multiplyLMul(T Val, RISCVII::VLMUL LMul) {
// Fractional
if (LMul >= RISCVII::LMUL_F8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use decodeVLMUL and encodeLMUL? I'd like to keep the encoding details isolated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants