Skip to content

Commit e1b7fda

Browse files
committed
Create llama2.rs library.
1 parent c426412 commit e1b7fda

File tree

7 files changed

+1383
-1
lines changed

7 files changed

+1383
-1
lines changed

.gitignore

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
################## MAC OS #################
2+
# General
3+
.DS_Store
4+
.AppleDouble
5+
.LSOverride
6+
7+
# Icon must end with two \r
8+
Icon
9+
10+
# Thumbnails
11+
._*
12+
13+
# Files that might appear in the root of a volume
14+
.DocumentRevisions-V100
15+
.fseventsd
16+
.Spotlight-V100
17+
.TemporaryItems
18+
.Trashes
19+
.VolumeIcon.icns
20+
.com.apple.timemachine.donotpresent
21+
22+
# Directories potentially created on remote AFP share
23+
.AppleDB
24+
.AppleDesktop
25+
Network Trash Folder
26+
Temporary Items
27+
.apdisk
28+
29+
################## Rust #################
30+
# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
31+
# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
32+
Cargo.lock
33+
34+
# These are backup files generated by rustfmt
35+
**/*.rs.bk
36+
37+
# MSVC Windows builds of rustc generate these, which store debugging information
38+
*.pdb
39+
40+
################## Archives #################
41+
# It's better to unpack these files and commit the raw source because
42+
# git has its own built in compression methods.
43+
*.7z
44+
*.jar
45+
*.rar
46+
*.zip
47+
*.gz
48+
*.gzip
49+
*.tgz
50+
*.bzip
51+
*.bzip2
52+
*.bz2
53+
*.xz
54+
*.lzma
55+
*.cab
56+
*.xar
57+
58+
# Packing-only formats
59+
*.iso
60+
*.tar
61+
62+
# Package management formats
63+
*.dmg
64+
*.xpi
65+
*.gem
66+
*.egg
67+
*.deb
68+
*.rpm
69+
*.msi
70+
*.msm
71+
*.msp
72+
*.txz
73+
74+
# Binary formats.
75+
stories*.bin
76+
target/

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[workspace]
2+
members = ["llama2_rs"]

README.md

Lines changed: 83 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,86 @@
1-
## llama2.c
1+
## llama2.rs
2+
3+
A Rust port of [llama2.c](https://huggingface.co/karpathy/llama2.c).
4+
5+
The goal of `llama2.rs` is to create a rust port for llama2.c,
6+
primarily targeting at a cross-platform implementation for on-device inference.
7+
8+
Features to highlight:
9+
- Similar to `llama2.c` with openmp, `llama2.rs` also utilizes model parallelization.
10+
- Utilize memory map for save runtime memory (with a flag `--is_mmap`).
11+
12+
### How to build and run inference.
13+
14+
**Prerequisite**: Download pretrained tinyllamas models.
15+
16+
```bash
17+
# stories15M is used for test and stories 110M is used for benchmark.
18+
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
19+
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
20+
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
21+
```
22+
23+
You can use `cargo` to build and run inference for `stories15M` model:
24+
25+
```bash
26+
cargo run --release -- --model_path=./stories15M.bin
27+
```
28+
29+
See `cargo run --release -- --help` for the full help doc.
30+
31+
You can run unit test with the below command with `stories15M.bin` downloaded in advance.
32+
33+
```bash
34+
cargo test
35+
```
36+
37+
The command to run bechmark with `stories110M.bin` is:
38+
39+
```bash
40+
cargo run --release -- --model_path=./stories110M.bin --is_benchmark
41+
```
42+
43+
### Performance comparison.
44+
45+
We conduct the inference benchmark on `stories110M.bin`,
46+
and comparing with llama2.c and Huggingface's [candle](https://github.com/huggingface/candle) library.
47+
48+
The performance is based on 10 repeated experiments on my Macbook,
49+
and calculate the mean of standard deviation. Here is my spec:
50+
51+
- 2.6 GHz 6-Core Intel Core i7, L2/L3 Cache: 256 KB/12 MB.
52+
- Memery: 16 GB 2667 MHz DDR4. Disk: APPLE SSD.
53+
- OS: Mac OS 13.5.
54+
- CC: Apple clang version 14.0.0.
55+
- Rust: rustc 1.71.1.
56+
57+
|-------------------|-----------------------------|
58+
| Experiments | #Token/s: mean (+- std) |
59+
|-------------------|-----------------------------|
60+
| llama2.rs | 40.228 (+-1.691) |
61+
| llama2.rs (mmap) | 37.736 (+-1.864) |
62+
| llama2.c | 27.585 (+-2.003) |
63+
| candle | 12.534 (+-0.417) |
64+
|-------------------|-----------------------------|
65+
66+
Notes:
67+
- mmap: Run with flag `--is_mmap`. Peak memory cost: 480MB -> 9MB.
68+
69+
- [llama2.c](https://huggingface.co/karpathy/llama2.c) is built and run with opts openmp+fast:
70+
71+
```bash
72+
clang -Ofast -fopenmp -march=native run.c -lm -o run
73+
./run stories110M.bin
74+
```
75+
(You may need LLVM and openmp to be installed.)
76+
77+
- [candle](https://github.com/huggingface/candle) is built with `accelerate` feature:
78+
79+
```bash
80+
cargo run --release --features accelerate --package candle-examples inference --which-model=stories110M.bin
81+
```
82+
83+
## README.md of original [llama2.c](https://github.com/karpathy/llama2.c)
284

385
<p align="center">
486
<img src="assets/llama_cute.jpg" width="300" height="300" alt="Cute Llama">

llama2_rs/Cargo.toml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
[package]
2+
name = "llama2_rs"
3+
version = "0.1.0"
4+
edition = "2021"
5+
author = "Tian Lin <lintian06@gmail.com>"
6+
7+
[dependencies]
8+
memmap2 = "0.7.1"
9+
rand = "0.8.5"
10+
rayon = { version = "1.7.0" }
11+
clap = { version = "4.3.21", features = ["derive"] }
12+
13+
[features]
14+
default = []

0 commit comments

Comments
 (0)