Update README on instruction on how to prompt with the API

This commit is contained in:
mudler 2023-03-23 19:25:28 +01:00
parent 6394d85ca2
commit 4c9c5ce4ce

View File

@ -33,7 +33,7 @@ llama-cli --model <model_path> --instruction <instruction> [--input <input>] [--
| model | MODEL_PATH | | The path to the pre-trained GPT-based model. | | model | MODEL_PATH | | The path to the pre-trained GPT-based model. |
| tokens | TOKENS | 128 | The maximum number of tokens to generate. | | tokens | TOKENS | 128 | The maximum number of tokens to generate. |
| threads | THREADS | NumCPU() | The number of threads to use for text generation. | | threads | THREADS | NumCPU() | The number of threads to use for text generation. |
| temperature | TEMPERATURE | 0.95 | Sampling temperature for model output. | | temperature | TEMPERATURE | 0.95 | Sampling temperature for model output. ( values between `0.1` and `1.0` ) |
| top_p | TOP_P | 0.85 | The cumulative probability for top-p sampling. | | top_p | TOP_P | 0.85 | The cumulative probability for top-p sampling. |
| top_k | TOP_K | 20 | The number of top-k tokens to consider for text generation. | | top_k | TOP_K | 20 | The number of top-k tokens to consider for text generation. |
| context-size | CONTEXT_SIZE | 512 | Default token context size. | | context-size | CONTEXT_SIZE | 512 | Default token context size. |
@ -98,6 +98,17 @@ curl --location --request POST 'http://localhost:8080/predict' --header 'Content
}' }'
``` ```
Note: The API doesn't inject a template for talking to the instance, while the CLI does. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release, for instance:
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
```
## Using other models ## Using other models
You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.2-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`. You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.2-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`.