Skip to content

Commit d02ef30

Browse files
authored
Update README.md
1 parent bcf3238 commit d02ef30

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ My goal of `llama2.rs` is to create a rust port for llama2.c,
1111
primarily targeting at a cross-platform implementation for on-device inference.
1212

1313
### Highlights:
14-
- Similar to `llama2.c` with openmp, `llama2.rs` also utilizes model parallelization. (*Benchmark: 27 -> 40, performance gain +46%*)
14+
- Similar to `llama2.c` with openmp, `llama2.rs` also utilizes model parallelization. (*[Benchmark](https://github.com/lintian06/llama2.rs#performance-comparison): 27.6 -> 40.2 tokens/s on `stories110M.bin`, +46% speedup over llama2.c*)
1515
- Utilize memory mapping for runtime memory reduction (with a flag `--is_mmap`). (*480MB -> 59MB, save up to 88% memory*)
1616

1717
### How to build and run inference.
@@ -33,7 +33,7 @@ cargo run --release -- --model_path=./stories15M.bin
3333

3434
See `cargo run --release -- --help` for the full help doc.
3535

36-
You can run unit test with the below command with `stories15M.bin` downloaded in advance.
36+
You can run unit test with the below command and `stories15M.bin` downloaded in advance.
3737

3838
```bash
3939
cargo test
@@ -59,7 +59,6 @@ and calculate the mean of standard deviation. Here is my spec:
5959
- CC: Apple clang version 14.0.0.
6060
- Rust: rustc 1.71.1.
6161

62-
6362
| Experiments | #Token/s: mean (± std) |
6463
|-------------------|----------------------------|
6564
| llama2.rs | 40.228 (±1.691) |

0 commit comments

Comments
 (0)