Use portable JIT compilation for accelerating RISC-V emulation

[rv8](https://github.com/michaeljclark/rv8) demonstrates how RISC-V instruction emulation can benefit from JIT compilation and aggressive optimizations. However, it is dedicated to x86-64 and hard to support other host architectures, such as Apple M1 (Aarch64). [SFUZZ](https://github.com/seal9055/sfuzz) is a high performance fuzzer using RISC-V to x86 binary translations with modern fuzzing techniques. [RVVM](https://github.com/LekKit/RVVM) is another example to implement tracing JIT.

The goal of this task to utilize existing JIT framework as a new abstraction layer while we accelerate RISC-V instruction executions. In particular, we would
1. Avoid direct machine code generation. Instead, most operations are enforced in intermediate representation (IR) level.
2. Perform common optimization techniques such as peephole optimization. [ria-jit](https://github.com/ria-jit) performs excellent work in regards to such optimization. See [src/gen/instr/patterns.c](https://github.com/ria-jit/ria-jit/blob/master/src/gen/instr/patterns.c) and [MEMZERO and MEMCOPY instructions proposal](https://github.com/AndyGlew/Ri5-stuff/wiki/MEMZERO-and-MEMCOPY-instructions-proposal)
3. Use high-level but still efficient IR. [MIR](https://github.com/vnmakarov/mir) is an interesting implementation, which allows using subset of C11 for IR. [SFUZZ](https://github.com/seal9055/sfuzz)'s note [Code Generation](https://github.com/seal9055/sfuzz/blob/main/docs/code_gen.md) is worth reading. [ETISS](https://github.com/tum-ei-eda/etiss) (Extendable Translating Instruction Set Simulator) translates binary instructions into C code and appends translated code into a block, which will be compiled and executed at runtime. As aforementioned, it is Extendable, thus it supports myriad level of customization by adopting the technique of plug-ins.
4. Ensure shorter startup-time. It can be achieved by means of lightweight JIT framework and AOT compilation.

The JIT compilation's high level operation can be summed up as follows:
* Look in a hash map for a code block matching the current PC
* if a block is found
    * execute this block
* if a block is not found
    * allocate a new block
    * invoke the translator for this block
    * insert it into the hash map
    * execute this block

Every block will come to an end after a branch instruction has been translated since translation occurs at the basic block level. Then, there is room for further optimization passes performed on the generated code.

We gain speed by using the technique for the reasons listed below:
* No instruction fetch
* No instruction decode
* Immediate values are baked into translated instructions
    * Values of 0 can be optimized
* register x0 can be optimized
    * No lookup required
    * Writes are discarded
* Reduced emulation loop overhead
* Blocks can be chained based on previous branch pattern for faster lookup

Reference:
* [Banshee: A Fast LLVM-Based RISC-V Binary Translator](https://pulp-platform.org/docs/Banshee_ICCAD_2021.pdf)
* [rv8: a high performance RISC-V to x86 binary translator](https://carrv.github.io/2017/papers/clark-rv8-carrv2017.pdf) / [slides](https://carrv.github.io/2017/slides/clark-rv8-carrv2017-slides.pdf)
* [Dynamic Binary Translation for RISC-V code on x86-64](https://github.com/ria-jit/ria-jit/blob/master/documentation/paper/paper.pdf) / [slides](https://github.com/ria-jit/ria-jit/blob/master/documentation/final/final.pdf) / [cm2 branch](https://github.com/aengelke/ria-jit/tree/cm2)
* [R2VM](https://github.com/nbdd0121/r2vm) / [Accelerate Cycle-Level Multi-Core RISC-V Simulation with Binary Translation](https://carrv.github.io/2020/slides/CARRV2020_slides_6_Guo.pdf)
* [Dynamic Binary Translator for RISC-V](https://garyguo.net/uploads/riscv-dbt.pdf) / [riscv-dbt](https://github.com/nbdd0121/riscv-dbt)
* [High-Performance RISC-V Emulation](https://core.ac.uk/download/pdf/296901253.pdf): LLVM-based Static Binary Translation (SBT)
* [Hybrid-DBT: Hardware/Software Dynamic Binary Translation Targeting VLIW](https://hal.archives-ouvertes.fr/hal-01856163/document) / [slides](https://riscv.org/wp-content/uploads/2017/05/Wed1545-HybridDBT-Rokicki.pdf) / [HybridDBT](https://github.com/srokicki/HybridDBT) / [translationCache](https://github.com/srokicki/HybridDBT/tree/translationCache)
* [Adaptive simulation with Virtual Prototypes in an open-source RISC-V evaluation platform](https://www.sciencedirect.com/science/article/pii/S1383762121001016)
* [Code specialization for the MIR lightweight JIT compiler](https://developers.redhat.com/articles/2022/02/16/code-specialization-mir-lightweight-jit-compiler)
* [A Faster CRuby interpreter with dynamically specialized IR](https://vmakarov.fedorapeople.org/VMakarov-RubyKaigi2022.pdf)
* [HIFIVE1-VP](https://github.com/Minres/HIFIVE1-VP)
* [RISC-V static binary translator targeting AArch64](https://github.com/losfair/rvjit-aa64) / [discussion](https://github.com/RustMagazine/rust_magazine_2021/blob/main/src/chapter_1/jit.md)
* [A Journey to the Absolute Limit: CKB-VM's LLVM AOT engine](https://xuejie.space/2022_09_08_a_journey_to_the_limit/)
* [libriscv: RISC-V Binary Translation](https://fwsgonzo.medium.com/libriscv-risc-v-binary-translation-e080f52ee696)
* [Explaining JavaScript VMs in JavaScript - Inline Caches](https://mrale.ph/blog/2012/06/03/explaining-js-vms-in-js-inline-caches.html) / [code: inline-cache](https://github.com/ywq880611/inline-cache)
* [Just-In-Time Compilation on ARM—A Closer Look at Call-Site Code Consistency](https://dl.acm.org/doi/10.1145/3546568)
* [LuaJIT Remake](https://github.com/luajit-remake/luajit-remake): the interpreter and the JIT tiers are automatically generated from a semantical description of the bytecodes. / [Building the fastest Lua interpreter.. automatically!](https://sillycross.github.io/2022/11/22/2022-11-22/)
* [Crafting Interpreters](https://craftinginterpreters.com/)
* [How I developed a faster Ruby interpreter](https://developers.redhat.com/articles/2022/11/22/how-i-developed-faster-ruby-interpreter#)
* [Parsing Protobuf at 2+GB/s: How I Learned To Love Tail Calls in C](https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html)
* [WasmNow](https://github.com/sillycross/WasmNow): Fast WebAssembly Baseline Compiler
* [A fast in-place interpreter for WebAssembly](https://arxiv.org/pdf/2205.01183.pdf) / [wizard-engine](https://github.com/titzer/wizard-engine)
* [Kilite](https://github.com/Kray-G/kilite): a complete example for using C2MIR
* [DarkSwordVM](https://github.com/SiriusNEO/DarkSwordVM/blob/main/Design.md): JIT compilation for LLVM IR
* [pydrofoil](https://github.com/pydrofoil/pydrofoil): Fast RISC-V Sail emulation using PyPy/RPython's JIT compiler
* [Compiling SQLite queries to native code with LLVM](https://github.com/KowalskiThomas/LLVMSQLite/blob/master/Thesis.pdf) / [LLVMSQLite](https://github.com/KowalskiThomas/LLVMSQLite)
* [QEMU Tiny-Code Threaded Interpreter](https://github.com/tctiSH/qemu/tree/with_tcti/tcg/aarch64-tcti)
* [MINRES: DBT-RISE](https://github.com/Minres): Dynamic Binary Translation (DBT) based environment to implement instruction set simulator (ISS)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use portable JIT compilation for accelerating RISC-V emulation #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use portable JIT compilation for accelerating RISC-V emulation #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions