Refactors api folder to core, creates firm split between backend code and api frontend.
2.3 KiB
+++ disableToc = false title = "🗣 Text to audio (TTS)" weight = 2 +++
The /tts
endpoint can be used to generate speech from text.
Input: input
, model
For example, to generate an audio file, you can send a POST request to the /tts
endpoint with the instruction as the request body:
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"input": "Hello world",
"model": "tts"
}'
Returns an audio/wav
file.
Text-To-Speech Setup
LocalAI supports [bark]({{%relref "model-compatibility/bark" %}}) , piper
and vall-e-x
:
{{% notice note %}}
The piper
backend is used for onnx
models and requires the modules to be downloaded first.
To install the piper
audio models manually:
- Download Voices from https://github.com/rhasspy/piper/releases/tag/v0.0.2
- Extract the
.tar.tgz
files (.onnx,.json) insidemodels
- Run the following command to test the model is working
{{% /notice %}}
To use the tts endpoint, run the following command. You can specify a backend with the backend
parameter. For example, to use the piper
backend:
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"model":"it-riccardo_fasol-x-low.onnx",
"backend": "piper",
"input": "Ciao, sono Ettore"
}' | aplay
Note:
aplay
is a Linux command. You can use other tools to play the audio file.- The model name is the filename with the extension.
- The model name is case sensitive.
- LocalAI must be compiled with the
GO_TAGS=tts
flag.
Music
LocalAI also has experimental support for transformers-musicgen
for the generation of short musical compositions. Currently, this is implemented via the same requests used for text to speech:
curl --request POST \
--url http://localhost:8080/tts \
--header 'Content-Type: application/json' \
--data '{
"backend": "transformers-musicgen",
"model": "facebook/musicgen-medium",
"input": "Cello Rave"
}' | aplay
Future versions of LocalAI will expose additional control over audio generation beyond the text prompt.
Configuration
Audio models can be configured via YAML
files. This allows to configure specific setting for each backend. For instance, backends might be specifying a voice or supports voice cloning which must be specified in the configuration file.
name: tts
backend: vall-e-x
parameters: ...