lintian06
diff --git a/‎.gitignore
Lines changed: 76 additions & 0 deletions b/‎.gitignore
Lines changed: 76 additions & 0 deletions
diff --git a/‎Cargo.toml
Lines changed: 2 additions & 0 deletions b/‎Cargo.toml
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 83 additions & 1 deletion b/‎README.md
Lines changed: 83 additions & 1 deletion
diff --git a/‎llama2_rs/Cargo.toml
Lines changed: 14 additions & 0 deletions b/‎llama2_rs/Cargo.toml
Lines changed: 14 additions & 0 deletions
@@ -0,0 +1,76 @@
+################## MAC OS #################
+# General
+.DS_Store
+.AppleDouble
+.LSOverride
+
+# Icon must end with two \r
+Icon
+
+# Thumbnails
+._*
+
+# Files that might appear in the root of a volume
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+
+# Directories potentially created on remote AFP share
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
+
+################## Rust #################
+# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
+# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
+Cargo.lock
+
+# These are backup files generated by rustfmt
+**/*.rs.bk
+
+# MSVC Windows builds of rustc generate these, which store debugging information
+*.pdb
+
+################## Archives #################
+# It's better to unpack these files and commit the raw source because
+# git has its own built in compression methods.
+*.7z
+*.jar
+*.rar
+*.zip
+*.gz
+*.gzip
+*.tgz
+*.bzip
+*.bzip2
+*.bz2
+*.xz
+*.lzma
+*.cab
+*.xar
+
+# Packing-only formats
+*.iso
+*.tar
+
+# Package management formats
+*.dmg
+*.xpi
+*.gem
+*.egg
+*.deb
+*.rpm
+*.msi
+*.msm
+*.msp
+*.txz
+
+# Binary formats.
+stories*.bin
+target/
@@ -0,0 +1,2 @@
+[workspace]
+members = ["llama2_rs"]
@@ -1,4 +1,86 @@
-## llama2.c
+## llama2.rs
+
+A Rust port of [llama2.c](https://huggingface.co/karpathy/llama2.c).
+
+The goal of `llama2.rs` is to create a rust port for llama2.c,
+primarily targeting at a cross-platform implementation for on-device inference.
+
+Features to highlight:
+- Similar to `llama2.c` with openmp, `llama2.rs` also utilizes model parallelization.
+- Utilize memory map for save runtime memory (with a flag `--is_mmap`).
+
+### How to build and run inference.
+
+**Prerequisite**: Download pretrained tinyllamas models.
+
+```bash
+# stories15M is used for test and stories 110M is used for benchmark.
+wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
+wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
+wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
+```
+
+You can use `cargo` to build and run inference for `stories15M` model:
+
+```bash
+cargo run --release -- --model_path=./stories15M.bin
+```
+
+See `cargo run --release -- --help` for the full help doc.
+
+You can run unit test with the below command with `stories15M.bin` downloaded in advance.
+
+```bash
+cargo test
+```
+
+The command to run bechmark with `stories110M.bin` is:
+
+```bash
+cargo run --release -- --model_path=./stories110M.bin --is_benchmark
+```
+
+### Performance comparison.
+
+We conduct the inference benchmark on `stories110M.bin`, 
+and comparing with llama2.c and Huggingface's [candle](https://github.com/huggingface/candle) library.
+
+The performance is based on 10 repeated experiments on my Macbook, 
+and calculate the mean of standard deviation. Here is my spec:
+
+- 2.6 GHz 6-Core Intel Core i7, L2/L3 Cache: 256 KB/12 MB.
+- Memery: 16 GB 2667 MHz DDR4. Disk: APPLE SSD.
+- OS: Mac OS 13.5.
+- CC: Apple clang version 14.0.0.
+- Rust: rustc 1.71.1.
+
+|-------------------|-----------------------------|
+| Experiments       | #Token/s: mean (+- std)     |
+|-------------------|-----------------------------|
+| llama2.rs         |  40.228 (+-1.691)           |
+| llama2.rs (mmap)  |  37.736 (+-1.864)           |
+| llama2.c          |  27.585 (+-2.003)           |
+| candle            |  12.534 (+-0.417)           |
+|-------------------|-----------------------------|
+
+Notes:
+- mmap: Run with flag `--is_mmap`. Peak memory cost: 480MB -> 9MB.
+
+- [llama2.c](https://huggingface.co/karpathy/llama2.c) is built and run with opts openmp+fast:
+
+```bash
+clang -Ofast -fopenmp -march=native run.c -lm -o run
+./run stories110M.bin
+```
+  (You may need LLVM and openmp to be installed.)
+
+- [candle](https://github.com/huggingface/candle) is built with `accelerate` feature:
+  
+```bash
+cargo run --release --features accelerate --package candle-examples inference --which-model=stories110M.bin
+```
+
+## README.md of original [llama2.c](https://github.com/karpathy/llama2.c)
 
 <p align="center">
   <img src="assets/llama_cute.jpg" width="300" height="300" alt="Cute Llama">
 
@@ -0,0 +1,14 @@
+[package]
+name = "llama2_rs"
+version = "0.1.0"
+edition = "2021"
+author = "Tian Lin <lintian06@gmail.com>"
+
+[dependencies]
+memmap2 = "0.7.1"
+rand = "0.8.5"
+rayon = { version = "1.7.0" }
+clap = { version = "4.3.21", features = ["derive"] }
+
+[features]
+default = []
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+[workspace]`
	`2`	`+members = ["llama2_rs"]`