Skip to content

Commit feb8b70

Browse files
authored
Nathan unify modelargs (#609)
- removed override batch size and made it part of model config - remove env config, config should be part of env directly, no need to load it up - better loading of models - baseclass for model configs, allows parsing of model config through cli or config file. - Unified naming for model args, i.e. `model_name` - removed openai endpoint, we can just use litellm for this, same for TGI and inference endpoints we don't really need it and it's better to have one interface
1 parent cc95ff2 commit feb8b70

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+710
-1289
lines changed

docs/source/_toctree.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@
2323
title: Use vllm as backend
2424
- local: use-sglang-as-backend
2525
title: Use SGLang as backend
26-
- local: evaluate-the-model-on-a-server-or-container
27-
title: Evaluate on Server
26+
- local: use-huggingface-inference-endpoints-or-tgi-as-backend
27+
title: Use Hugging Face inference endpoints or TGI as backend
2828
- local: contributing-to-multilingual-evaluations
2929
title: Contributing to multilingual evaluations
3030
title: Guides

docs/source/package_reference/models.mdx

-4
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,6 @@
3131
### Open AI Models
3232
[[autodoc]] models.endpoints.openai_model.OpenAIClient
3333

34-
## Nanotron Model
35-
### NanotronLightevalModel
36-
[[autodoc]] models.nanotron.nanotron_model.NanotronLightevalModel
37-
3834
## VLLM Model
3935
### VLLMModel
4036
[[autodoc]] models.vllm.vllm_model.VLLMModelConfig

docs/source/evaluate-the-model-on-a-server-or-container.mdx renamed to docs/source/use-huggingface-inference-endpoints-or-tgi-as-backend.mdx

+6-26
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,12 @@ be deleted afterwards).
2525
__configuration file example:__
2626

2727
```yaml
28-
model:
29-
base_params:
30-
# Pass either model_name, or endpoint_name and true reuse_existing
31-
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
32-
# reuse_existing: true # defaults to false; if true, ignore all params in instance, and don't delete the endpoint after evaluation
28+
model_parameters:
29+
reuse_existing: false # if true, ignore all params in instance, and don't delete the endpoint after evaluation
30+
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
3331
model_name: "meta-llama/Llama-2-7b-hf"
34-
# revision: "main" # defaults to "main"
32+
revision: "main" # defaults to "main"
3533
dtype: "float16" # can be any of "awq", "eetq", "gptq", "4bit' or "8bit" (will use bitsandbytes), "bfloat16" or "float16"
36-
instance:
3734
accelerator: "gpu"
3835
region: "eu-west-1"
3936
vendor: "aws"
@@ -44,7 +41,7 @@ model:
4441
namespace: null # The namespace under which to launch the endpoint. Defaults to the current user's namespace
4542
image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
4643
env_vars:
47-
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
44+
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
4845
```
4946

5047
### Text Generation Inference (TGI)
@@ -55,25 +52,8 @@ serverless inference.
5552
__configuration file example:__
5653

5754
```yaml
58-
model:
59-
instance:
55+
model_parameters:
6056
inference_server_address: ""
6157
inference_server_auth: null
6258
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
6359
```
64-
65-
### OpenAI API
66-
67-
Lighteval also supports evaluating models on the OpenAI API. To do so you need to set your OpenAI API key in the environment variable.
68-
69-
```bash
70-
export OPENAI_API_KEY={your_key}
71-
```
72-
73-
And then run the following command:
74-
75-
```bash
76-
lighteval endpoint openai \
77-
{model-name} \
78-
<task parameters>
79-
```

docs/source/use-inference-providers-as-backend.mdx

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Lighteval allows to use Hugging Face's Inference Providers to evaluate llms on s
1111

1212
```bash
1313
lighteval endpoint inference-providers \
14-
"model=deepseek-ai/DeepSeek-R1,provider=hf-inference" \
14+
"model_name=deepseek-ai/DeepSeek-R1,provider=hf-inference" \
1515
"lighteval|gsm8k|0|0"
1616
```
1717

@@ -28,13 +28,13 @@ lighteval endpoint inference-providers \
2828
with the following config file:
2929

3030
```yaml
31-
model:
31+
model_parameters:
3232
model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
3333
provider: "novita"
3434
timeout: null
3535
proxies: null
3636
parallel_calls_count: 10
37-
generation:
37+
generation_parameters:
3838
temperature: 0.8
3939
top_k: 10
4040
max_new_tokens: 10000

docs/source/use-litellm-as-backend.mdx

+13-11
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,14 @@ Documentation for available APIs and compatible endpoints can be found [here](ht
1010

1111
```bash
1212
lighteval endpoint litellm \
13-
"gpt-3.5-turbo" \
13+
"provider=openai,model_name=gpt-3.5-turbo" \
1414
"lighteval|gsm8k|0|0" \
1515
--use-chat-template
1616
```
1717

18+
> [!WARNING]
19+
> `--use-chat-template` is required for litellm to work properly.
20+
1821
## Using a config file
1922

2023
Litellm allows generation with any OpenAI compatible endpoint, for example you
@@ -23,17 +26,16 @@ can evaluate a model running on a local vllm server.
2326
To do so you will need to use a config file like so:
2427

2528
```yaml
26-
model:
27-
base_params:
29+
model_parameters:
2830
model_name: "openai/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
2931
base_url: "URL OF THE ENDPOINT YOU WANT TO USE"
3032
api_key: "" # remove or keep empty as needed
31-
generation:
32-
temperature: 0.5
33-
max_new_tokens: 256
34-
stop_tokens: [""]
35-
top_p: 0.9
36-
seed: 0
37-
repetition_penalty: 1.0
38-
frequency_penalty: 0.0
33+
generation_parameters:
34+
temperature: 0.5
35+
max_new_tokens: 256
36+
stop_tokens: [""]
37+
top_p: 0.9
38+
seed: 0
39+
repetition_penalty: 1.0
40+
frequency_penalty: 0.0
3941
```

docs/source/use-sglang-as-backend.mdx

+34-16
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ To use, simply change the `model_args` to reflect the arguments you want to pass
55

66
```bash
77
lighteval sglang \
8-
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
8+
"model_name=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
99
"leaderboard|truthfulqa:mc|0|0"
1010
```
1111

@@ -17,15 +17,15 @@ For example if you have 4 GPUs you can split it across using `tp_size`:
1717

1818
```bash
1919
lighteval sglang \
20-
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tp_size=4" \
20+
"model_name=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tp_size=4" \
2121
"leaderboard|truthfulqa:mc|0|0"
2222
```
2323

2424
Or, if your model fits on a single GPU, you can use `dp_size` to speed up the evaluation:
2525

2626
```bash
2727
lighteval sglang \
28-
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,dp_size=4" \
28+
"model_name=HuggingFaceH4/zephyr-7b-beta,dtype=float16,dp_size=4" \
2929
"leaderboard|truthfulqa:mc|0|0"
3030
```
3131

@@ -40,20 +40,38 @@ lighteval sglang \
4040
"leaderboard|truthfulqa:mc|0|0"
4141
```
4242

43+
> [!TIP]
44+
> Documentation for the config file of sglang can be found [here](https://docs.sglang.ai/backend/server_arguments.html)
45+
4346
```yaml
44-
model: # Model specific parameters
45-
base_params:
46-
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,dtype=float16,chunked_prefill_size=4096,mem_fraction_static=0.9" # Model args that you would pass in the command line
47-
generation: # Generation specific parameters
48-
temperature: 0.3
49-
repetition_penalty: 1.0
50-
frequency_penalty: 0.0
51-
presence_penalty: 0.0
52-
top_k: -1
53-
min_p: 0.0
54-
top_p: 0.9
55-
max_new_tokens: 256
56-
stop_tokens: ["<EOS>", "<PAD>"]
47+
model_parameters:
48+
model_name: "HuggingFaceTB/SmolLM-1.7B-Instruct"
49+
dtype: "auto"
50+
tp_size: 1
51+
dp_size: 1
52+
context_length: null
53+
random_seed: 1
54+
trust_remote_code: False
55+
use_chat_template: False
56+
device: "cuda"
57+
skip_tokenizer_init: False
58+
kv_cache_dtype: "auto"
59+
add_special_tokens: True
60+
pairwise_tokenization: False
61+
sampling_backend: null
62+
attention_backend: null
63+
mem_fraction_static: 0.8
64+
chunked_prefill_size: 4096
65+
generation_parameters:
66+
max_new_tokens: 1024
67+
min_new_tokens: 0
68+
temperature: 1.0
69+
top_k: 50
70+
min_p: 0.0
71+
top_p: 1.0
72+
presence_penalty: 0.0
73+
repetition_penalty: 1.0
74+
frequency_penalty: 0.0
5775
```
5876
5977
> [!WARNING]

docs/source/use-vllm-as-backend.mdx

+67-29
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,13 @@
33
Lighteval allows you to use `vllm` as backend allowing great speedups.
44
To use, simply change the `model_args` to reflect the arguments you want to pass to vllm.
55

6+
7+
> [!TIP]
8+
> Documentation for vllm engine args can be found [here](https://docs.vllm.ai/en/latest/serving/engine_args.html)
9+
610
```bash
711
lighteval vllm \
8-
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
12+
"model_name=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
913
"leaderboard|truthfulqa:mc|0|0"
1014
```
1115

@@ -17,15 +21,15 @@ For example if you have 4 GPUs you can split it across using `tensor_parallelism
1721

1822
```bash
1923
export VLLM_WORKER_MULTIPROC_METHOD=spawn && lighteval vllm \
20-
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
24+
"model_name=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
2125
"leaderboard|truthfulqa:mc|0|0"
2226
```
2327

2428
Or, if your model fits on a single GPU, you can use `data_parallelism` to speed up the evaluation:
2529

2630
```bash
2731
lighteval vllm \
28-
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
32+
"model_name=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
2933
"leaderboard|truthfulqa:mc|0|0"
3034
```
3135

@@ -41,18 +45,35 @@ lighteval vllm \
4145
```
4246

4347
```yaml
44-
model: # Model specific parameters
45-
base_params:
46-
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,revision=main,dtype=bfloat16" # Model args that you would pass in the command line
47-
generation: # Generation specific parameters
48-
temperature: 0.3
49-
repetition_penalty: 1.0
50-
frequency_penalty: 0.0
51-
presence_penalty: 0.0
52-
seed: 42
53-
top_k: 0
54-
min_p: 0.0
55-
top_p: 0.9
48+
model_parameters:
49+
model_name: "HuggingFaceTB/SmolLM-1.7B-Instruct"
50+
revision: "main"
51+
dtype: "bfloat16"
52+
tensor_parallel_size: 1
53+
data_parallel_size: 1
54+
pipeline_parallel_size: 1
55+
gpu_memory_utilization: 0.9
56+
max_model_length: 2048
57+
swap_space: 4
58+
seed: 1
59+
trust_remote_code: True
60+
use_chat_template: True
61+
add_special_tokens: True
62+
multichoice_continuations_start_space: True
63+
pairwise_tokenization: True
64+
subfolder: null
65+
generation_parameters:
66+
presence_penalty: 0.0
67+
repetition_penalty: 1.0
68+
frequency_penalty: 0.0
69+
temperature: 1.0
70+
top_k: 50
71+
min_p: 0.0
72+
top_p: 1.0
73+
seed: 42
74+
stop_tokens: null
75+
max_new_tokens: 1024
76+
min_new_tokens: 0
5677
```
5778
5879
> [!WARNING]
@@ -66,21 +87,38 @@ For special kinds of metrics like `Pass@K` or LiveCodeBench's `codegen` metric,
6687
generations. This can be done in the `yaml` file in the following way:
6788

6889
```yaml
69-
model: # Model specific parameters
70-
base_params:
71-
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,revision=main,dtype=bfloat16" # Model args that you would pass in the command line
72-
generation: # Generation specific parameters
73-
temperature: 0.3
74-
repetition_penalty: 1.0
75-
frequency_penalty: 0.0
76-
presence_penalty: 0.0
77-
seed: 42
78-
top_k: 0
79-
min_p: 0.0
80-
top_p: 0.9
81-
metric_options: # Optional metric arguments
90+
model_parameters:
91+
model_name: "HuggingFaceTB/SmolLM-1.7B-Instruct"
92+
revision: "main"
93+
dtype: "bfloat16"
94+
tensor_parallel_size: 1
95+
data_parallel_size: 1
96+
pipeline_parallel_size: 1
97+
gpu_memory_utilization: 0.9
98+
max_model_length: 2048
99+
swap_space: 4
100+
seed: 1
101+
trust_remote_code: True
102+
use_chat_template: True
103+
add_special_tokens: True
104+
multichoice_continuations_start_space: True
105+
pairwise_tokenization: True
106+
subfolder: null
107+
generation_parameters:
108+
presence_penalty: 0.0
109+
repetition_penalty: 1.0
110+
frequency_penalty: 0.0
111+
temperature: 1.0
112+
top_k: 50
113+
min_p: 0.0
114+
top_p: 1.0
115+
seed: 42
116+
stop_tokens: null
117+
max_new_tokens: 1024
118+
min_new_tokens: 0
119+
metric_options: # Optional metric arguments
82120
codegen_pass@1:16:
83-
num_samples: 16
121+
num_samples: 16
84122
```
85123

86124
An optional key `metric_options` can be passed in the yaml file,

docs/source/using-the-python-api.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def main():
4040
)
4141

4242
model_config = VLLMModelConfig(
43-
pretrained="HuggingFaceH4/zephyr-7b-beta",
43+
model_name="HuggingFaceH4/zephyr-7b-beta",
4444
dtype="float16",
4545
use_chat_template=True,
4646
)

0 commit comments

Comments
 (0)