Skip to content

Commit c426412

Browse files
committed
turn off topp sampling by default because it is a bit too slow to be the default. it is likely that turning it on, e.g. -p 0.9 is midlly higher quality and safer samples, but this comes at a cost of too much performance in double digit percent sometimes, for it to be on by default i think...
1 parent 3f69c6c commit c426412

File tree

2 files changed

+6
-4
lines changed

2 files changed

+6
-4
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,9 @@ You can also prompt the model with a prefix or a number of additional command li
5656

5757
> One day, Lily met a Shoggoth. He was very shy, but was also very generous. Lily said “Hello Shoggy! Can I be your friend?” Shoggy was happy to have a friend and said “Yes, let’s explore the universe together!” So they set off on a journey to explore the universe. As they travelled, Shoggy was happy to explain to Lily about all the wonderful things in the universe. At the end of the day, Lily and Shoggy had gathered lots of wonderful things from the universe, and they both felt very proud. They promised to explore the universe as one big pair and to never stop being generous to each other.
5858
59-
There is also an even better 110M param model available, see [models](#models). Quick note on sampling, the recommendation for good results is to use `-t 1.0 -p 0.9`, i.e. top-p sampling at 0.9 with temperature 1.0 (this is the default). To control the diversity of samples use either the temperature (i.e. vary `-t` between 0 and 1 and keep top-p off with `-p 0`) or the top-p value (i.e. vary `-p` between 0 and 1 and keep `-t 1`), but not both. Nice explainers on LLM sampling strategies include [this](https://peterchng.com/blog/2023/05/02/token-selection-strategies-top-k-top-p-and-temperature/), [this](https://docs.cohere.com/docs/controlling-generation-with-top-k-top-p) or [this](https://huggingface.co/blog/how-to-generate).
59+
There is also an even better 110M param model available, see [models](#models).
60+
61+
Quick note on sampling, the recommendation for ~best results is to sample with `-t 1.0 -p 0.9`, i.e. temperature 1.0 (default) but also top-p sampling at 0.9 (not default!). The top-p sampling is turned off by default because it can run quite a bit slower. More generally, to control the diversity of samples use either the temperature (i.e. vary `-t` between 0 and 1 and keep top-p off with `-p 0`) or the top-p value (i.e. vary `-p` between 0 and 1 and keep `-t 1`), but not both. Nice explainers on LLM sampling strategies include [this](https://peterchng.com/blog/2023/05/02/token-selection-strategies-top-k-top-p-and-temperature/), [this](https://docs.cohere.com/docs/controlling-generation-with-top-k-top-p) or [this](https://huggingface.co/blog/how-to-generate).
6062

6163
## Meta's Llama 2 models
6264

run.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -504,7 +504,7 @@ void error_usage() {
504504
fprintf(stderr, "Example: run model.bin -n 256 -i \"Once upon a time\"\n");
505505
fprintf(stderr, "Options:\n");
506506
fprintf(stderr, " -t <float> temperature, default 1.0\n");
507-
fprintf(stderr, " -p <float> p value in top-p (nucleus) sampling. default 0.9, 0 = off\n");
507+
fprintf(stderr, " -p <float> p value in top-p (nucleus) sampling. default 1.0 (=off)\n");
508508
fprintf(stderr, " -s <int> random seed, default time(NULL)\n");
509509
fprintf(stderr, " -n <int> number of steps to run for, default 256. 0 = max_seq_len\n");
510510
fprintf(stderr, " -i <string> input prompt\n");
@@ -516,7 +516,7 @@ int main(int argc, char *argv[]) {
516516
// default inits
517517
char *checkpoint = NULL; // e.g. out/model.bin
518518
float temperature = 1.0f; // 0.0 = greedy deterministic. 1.0 = original. don't set higher
519-
float topp = 0.9f; // top-p in nucleus sampling
519+
float topp = 1.0f; // top-p in nucleus sampling. 1.0 = off. 0.9 works well, but slower
520520
rng_seed = 0; // seed rng with time by default
521521
int steps = 256; // number of steps to run for
522522
char *prompt = NULL; // prompt string
@@ -623,7 +623,7 @@ int main(int argc, char *argv[]) {
623623
// apply softmax to the logits to get the probabilities for next token
624624
softmax(state.logits, config.vocab_size);
625625
// we sample from this distribution to get the next token
626-
if (topp <= 0) {
626+
if (topp <= 0 || topp >= 1) {
627627
// simply sample from the predicted probability distribution
628628
next = sample(state.logits, config.vocab_size);
629629
} else {

0 commit comments

Comments
 (0)