You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- removed override batch size and made it part of model config
- remove env config, config should be part of env directly, no need to load it up
- better loading of models
- baseclass for model configs, allows parsing of model config through cli or config file.
- Unified naming for model args, i.e. `model_name`
- removed openai endpoint, we can just use litellm for this, same for TGI and inference endpoints we don't really need it and it's better to have one interface
Copy file name to clipboardExpand all lines: docs/source/use-huggingface-inference-endpoints-or-tgi-as-backend.mdx
+6-26
Original file line number
Diff line number
Diff line change
@@ -25,15 +25,12 @@ be deleted afterwards).
25
25
__configuration file example:__
26
26
27
27
```yaml
28
-
model:
29
-
base_params:
30
-
# Pass either model_name, or endpoint_name and true reuse_existing
31
-
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
32
-
# reuse_existing: true # defaults to false; if true, ignore all params in instance, and don't delete the endpoint after evaluation
28
+
model_parameters:
29
+
reuse_existing: false # if true, ignore all params in instance, and don't delete the endpoint after evaluation
30
+
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
33
31
model_name: "meta-llama/Llama-2-7b-hf"
34
-
#revision: "main" # defaults to "main"
32
+
revision: "main"# defaults to "main"
35
33
dtype: "float16"# can be any of "awq", "eetq", "gptq", "4bit' or "8bit" (will use bitsandbytes), "bfloat16" or "float16"
36
-
instance:
37
34
accelerator: "gpu"
38
35
region: "eu-west-1"
39
36
vendor: "aws"
@@ -44,7 +41,7 @@ model:
44
41
namespace: null # The namespace under which to launch the endpoint. Defaults to the current user's namespace
45
42
image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
46
43
env_vars:
47
-
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
44
+
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
48
45
```
49
46
50
47
### Text Generation Inference (TGI)
@@ -55,25 +52,8 @@ serverless inference.
55
52
__configuration file example:__
56
53
57
54
```yaml
58
-
model:
59
-
instance:
55
+
model_parameters:
60
56
inference_server_address: ""
61
57
inference_server_auth: null
62
58
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
63
59
```
64
-
65
-
### OpenAI API
66
-
67
-
Lighteval also supports evaluating models on the OpenAI API. To do so you need to set your OpenAI API key in the environment variable.
> Documentation for the config file of sglang can be found [here](https://docs.sglang.ai/backend/server_arguments.html)
45
+
43
46
```yaml
44
-
model: # Model specific parameters
45
-
base_params:
46
-
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,dtype=float16,chunked_prefill_size=4096,mem_fraction_static=0.9"# Model args that you would pass in the command line
0 commit comments