Add energy efficiency tracking index to llama-bench tests #4297
Replies: 2 comments 1 reply
-
M2 Max Studio, 8+4 CPU, 38 GPU, 96 GB RAM% ./llama-bench -m models/llama-7b-v2/ggml-model-f16.gguf -m models/llama-7b-v2/ggml-model-q8_0.gguf -m models/llama-7b-v2/ggml-model-q4_0.gguf -p 512 -n 128 -ngl 99 2> /dev/null
build: 8e672ef (1550) % sudo nice -n 10 powermetrics --samplers gpu_power -f plist -i 1000 > sample.out |
Beta Was this translation helpful? Give feedback.
-
This is a cool idea. Before going to far with it, though, try looking at system power consumption during inference using https://github.com/exelban/stats. There seems to be a significant, load-dependent, draw that isn't accounted for in the cpu_power or gpu_power sampler. I don't know if it's the power draw from RAM, or what. I also saw similar results using iStat Menus, an older commercial utility that Stats kind of mimics. |
Beta Was this translation helpful? Give feedback.
-
While reporting the Apple M metrics that measures Perplexity and Token per Seconds, I thought it would be nice to also track energy consumption to compare not only performance but costs to that gained performance with cores and memory scale.
I have done an small POC and for a
llama-bench
with three models I got the following result:Of course, that is a very raw result as it is only a POC, however much more information can be extracted from the data source (
powermetrics
), here is a sample of one datapoint:Here is the dirty & quick python code used to create the graph:
and here the PowerMetrics command line to track the test:
sudo nice -n 10 powermetrics --samplers gpu_power -f plist -i 1000 > sample.out
llama-bench
command used to generate this data:Powermetrics offers much more metrics through samplers, I only use
gpu_power
as it would be the more juicy one, however there is a thermal energy tracking as well that could be useful for benchmarking.In order to make this work we would need to:
powermetrics
tool when the test starts and stop it when the test finishesIf there is enough interest on it, I could use some spare time to work on it.
Related projects:
Beta Was this translation helpful? Give feedback.
All reactions