diff --git a/README.md b/README.md index a6e03706..37ef803d 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ llama-cli --model --instruction [--input ] [-- | model | MODEL_PATH | | The path to the pre-trained GPT-based model. | | tokens | TOKENS | 128 | The maximum number of tokens to generate. | | threads | THREADS | NumCPU() | The number of threads to use for text generation. | -| temperature | TEMPERATURE | 0.95 | Sampling temperature for model output. | +| temperature | TEMPERATURE | 0.95 | Sampling temperature for model output. ( values between `0.1` and `1.0` ) | | top_p | TOP_P | 0.85 | The cumulative probability for top-p sampling. | | top_k | TOP_K | 20 | The number of top-k tokens to consider for text generation. | | context-size | CONTEXT_SIZE | 512 | Default token context size. | @@ -98,6 +98,17 @@ curl --location --request POST 'http://localhost:8080/predict' --header 'Content }' ``` +Note: The API doesn't inject a template for talking to the instance, while the CLI does. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release, for instance: + +``` +Below is an instruction that describes a task. Write a response that appropriately completes the request. + +### Instruction: +{instruction} + +### Response: +``` + ## Using other models You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.2-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`.