Update README on instruction on how to prompt with the API

2024-06-07 19:40:48 +00:00 · 2023-03-23 19:25:28 +01:00 · 2023-03-23 19:25:28 +01:00 · 4c9c5ce4ce
commit 4c9c5ce4ce
parent 6394d85ca2
1 changed files with 12 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -33,7 +33,7 @@ llama-cli --model <model_path> --instruction <instruction> [--input <input>] [--
 | model        | MODEL_PATH           |               | The path to the pre-trained GPT-based model.      |
 | tokens       | TOKENS               | 128           | The maximum number of tokens to generate. |
 | threads      | THREADS              | NumCPU()      | The number of threads to use for text generation. |
-| temperature  | TEMPERATURE          | 0.95          | Sampling temperature for model output.  |
+| temperature  | TEMPERATURE          | 0.95          | Sampling temperature for model output. ( values between `0.1` and `1.0` )  |
 | top_p        | TOP_P                | 0.85          | The cumulative probability for top-p sampling. |
 | top_k        | TOP_K                | 20            | The number of top-k tokens to consider for text generation.  |
 | context-size | CONTEXT_SIZE         | 512           | Default token context size. |
@ -98,6 +98,17 @@ curl --location --request POST 'http://localhost:8080/predict' --header 'Content
 }'
 ```
 Note: The API doesn't inject a template for talking to the instance, while the CLI does. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release, for instance:
 ```
 Below is an instruction that describes a task. Write a response that appropriately completes the request.
 ### Instruction:
 {instruction}
 ### Response:
 ```
 ## Using other models
 You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.2-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`.