* Update path to sentencetransformers backend for local execution Signed-off-by: Marcus Köhler <khler.marcus@gmail.com> * Rename huggingface-embeddings -> sentencetransformers in embeddings.md for consistency with the backend structure The Dockerfile still knows the "huggingface-embeddings" backend (I assume for compatibility reasons) but uses the sentencetransformers backend under the hood anyway. I figured it would be good to update the docs to use the new naming to make it less confusing moving forward. As the docker container knows both the "huggingface-embeddings" and the "sentencetransformers" backend, this should not break anything. Signed-off-by: Marcus Köhler <khler.marcus@gmail.com> --------- Signed-off-by: Marcus Köhler <khler.marcus@gmail.com>
3.3 KiB
+++ disableToc = false title = "🧠 Embeddings" weight = 2 +++
LocalAI supports generating embeddings for text or list of tokens.
For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings
Model compatibility
The embedding endpoint is compatible with llama.cpp
models, bert.cpp
models and sentence-transformers models available in huggingface.
Manual Setup
Create a YAML
config file in the models
directory. Specify the backend
and the model file.
name: text-embedding-ada-002 # The model name used in the API
parameters:
model: <model_file>
backend: "<backend>"
embeddings: true
# .. other parameters
Bert embeddings
To use bert.cpp
models you can use the bert
embedding backend.
An example model config file:
name: text-embedding-ada-002
parameters:
model: bert
backend: bert-embeddings
embeddings: true
# .. other parameters
The bert
backend uses bert.cpp and uses ggml
models.
For instance you can download the ggml
quantized version of all-MiniLM-L6-v2
from https://huggingface.co/skeskinen/ggml:
wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
To test locally (LocalAI server running on localhost
),
you can use curl
(and jq
at the end to prettify):
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "Your text string goes here",
"model": "text-embedding-ada-002"
}' | jq "."
Huggingface embeddings
To use sentence-transformers
and models in huggingface
you can use the sentencetransformers
embedding backend.
name: text-embedding-ada-002
backend: sentencetransformers
embeddings: true
parameters:
model: all-MiniLM-L6-v2
The sentencetransformers
backend uses Python sentence-transformers. For a list of all pre-trained models available see here: https://github.com/UKPLab/sentence-transformers#pre-trained-models
{{% notice note %}}
- The
sentencetransformers
backend is an optional backend of LocalAI and uses Python. If you are runningLocalAI
from the containers you are good to go and should be already configured for use. If you are runningLocalAI
manually you must install the python dependencies (pip install -r /path/to/LocalAI/extra/requirements
) and specify the extra backend in theEXTERNAL_GRPC_BACKENDS
environment variable (EXTERNAL_GRPC_BACKENDS="sentencetransformers:/path/to/LocalAI/backend/python/sentencetransformers/sentencetransformers.py"
) . - The
sentencetransformers
backend does support only embeddings of text, and not of tokens. If you need to embed tokens you can use thebert
backend orllama.cpp
. - No models are required to be downloaded before using the
sentencetransformers
backend. The models will be downloaded automatically the first time the API is used.
{{% /notice %}}
Llama.cpp embeddings
Embeddings with llama.cpp
are supported with the llama
backend.
name: my-awesome-model
backend: llama
embeddings: true
parameters:
model: ggml-file.bin
# ...
💡 Examples
- Example that uses LLamaIndex and LocalAI as embedding: here.