Skip to content

Use portable JIT compilation for accelerating RISC-V emulation #81

Open
@jserv

Description

@jserv

rv8 demonstrates how RISC-V instruction emulation can benefit from JIT compilation and aggressive optimizations. However, it is dedicated to x86-64 and hard to support other host architectures, such as Apple M1 (Aarch64). SFUZZ is a high performance fuzzer using RISC-V to x86 binary translations with modern fuzzing techniques. RVVM is another example to implement tracing JIT.

The goal of this task to utilize existing JIT framework as a new abstraction layer while we accelerate RISC-V instruction executions. In particular, we would

  1. Avoid direct machine code generation. Instead, most operations are enforced in intermediate representation (IR) level.
  2. Perform common optimization techniques such as peephole optimization. ria-jit performs excellent work in regards to such optimization. See src/gen/instr/patterns.c and MEMZERO and MEMCOPY instructions proposal
  3. Use high-level but still efficient IR. MIR is an interesting implementation, which allows using subset of C11 for IR. SFUZZ's note Code Generation is worth reading. ETISS (Extendable Translating Instruction Set Simulator) translates binary instructions into C code and appends translated code into a block, which will be compiled and executed at runtime. As aforementioned, it is Extendable, thus it supports myriad level of customization by adopting the technique of plug-ins.
  4. Ensure shorter startup-time. It can be achieved by means of lightweight JIT framework and AOT compilation.

The JIT compilation's high level operation can be summed up as follows:

  • Look in a hash map for a code block matching the current PC
  • if a block is found
    • execute this block
  • if a block is not found
    • allocate a new block
    • invoke the translator for this block
    • insert it into the hash map
    • execute this block

Every block will come to an end after a branch instruction has been translated since translation occurs at the basic block level. Then, there is room for further optimization passes performed on the generated code.

We gain speed by using the technique for the reasons listed below:

  • No instruction fetch
  • No instruction decode
  • Immediate values are baked into translated instructions
    • Values of 0 can be optimized
  • register x0 can be optimized
    • No lookup required
    • Writes are discarded
  • Reduced emulation loop overhead
  • Blocks can be chained based on previous branch pattern for faster lookup

Reference:

Metadata

Metadata

Labels

researchStudy certain topics

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions