LocalAI/text-to-audio.md at b5c93f176aacde79536c1b2e469ada3062d6a17f

2.3 KiB

Raw Blame History

+++ disableToc = false title = "🗣 Text to audio (TTS)" weight = 2 +++

The /tts endpoint can be used to generate speech from text.

Input: input, model

For example, to generate an audio file, you can send a POST request to the /tts endpoint with the instruction as the request body:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "input": "Hello world",
  "model": "tts"
}'

Returns an audio/wav file.

Setup

LocalAI supports [bark]({{%relref "model-compatibility/bark" %}}) , piper and vall-e-x:

The piper backend is used for onnx models and requires the modules to be downloaded first.

To install the piper audio models manually:

Download Voices from https://github.com/rhasspy/piper/releases/tag/v0.0.2
Extract the .tar.tgz files (.onnx,.json) inside models
Run the following command to test the model is working

To use the tts endpoint, run the following command. You can specify a backend with the backend parameter. For example, to use the piper backend:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "model":"it-riccardo_fasol-x-low.onnx",
  "backend": "piper",
  "input": "Ciao, sono Ettore"
}' | aplay

Note:

aplay is a Linux command. You can use other tools to play the audio file.
The model name is the filename with the extension.
The model name is case sensitive.
LocalAI must be compiled with the GO_TAGS=tts flag.

LocalAI also has experimental support for transformers-musicgen for the generation of short musical compositions. Currently, this is implemented via the same requests used for text to speech:

curl --request POST \
  --url http://localhost:8080/tts \
  --header 'Content-Type: application/json' \
  --data '{
    "backend": "transformers-musicgen",
    "model": "facebook/musicgen-medium",
    "input": "Cello Rave"
}' | aplay```

Future versions of LocalAI will expose additional control over audio generation beyond the text prompt.

#### Configuration

Audio models can be configured via `YAML` files. This allows to configure specific setting for each backend. For instance, backends might be specifying a voice or supports voice cloning which must be specified in the configuration file.

```yaml
name: tts
backend: vall-e-x
parameters: ...

2.3 KiB Raw Blame History

Setup

2.3 KiB

Raw Blame History