mirror of
https://github.com/mudler/LocalAI.git
synced 2024-06-07 19:40:48 +00:00
ba5ab26f2e
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
39 lines
998 B
Markdown
39 lines
998 B
Markdown
|
|
+++
|
|
disableToc = false
|
|
title = "vLLM"
|
|
weight = 4
|
|
+++
|
|
|
|
[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference.
|
|
|
|
LocalAI has a built-in integration with vLLM, and it can be used to run models. You can check out `vllm` performance [here](https://github.com/vllm-project/vllm#performance).
|
|
|
|
## Setup
|
|
|
|
Create a YAML file for the model you want to use with `vllm`.
|
|
|
|
To setup a model, you need to just specify the model name in the YAML config file:
|
|
```yaml
|
|
name: vllm
|
|
backend: vllm
|
|
parameters:
|
|
model: "facebook/opt-125m"
|
|
|
|
# Decomment to specify a quantization method (optional)
|
|
# quantization: "awq"
|
|
```
|
|
|
|
The backend will automatically download the required files in order to run the model.
|
|
|
|
|
|
## Usage
|
|
|
|
Use the `completions` endpoint by specifying the `vllm` backend:
|
|
```
|
|
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
|
|
"model": "vllm",
|
|
"prompt": "Hello, my name is",
|
|
"temperature": 0.1, "top_p": 0.1
|
|
}'
|
|
``` |