LocalAI/text-to-audio.md at ab7b4d5ee9448e533a342bd1771393acd2967191

mirror of https://github.com/mudler/LocalAI.git synced 2024-06-07 19:40:48 +00:00

Refactors api folder to core, creates firm split between backend code and api frontend.

2024-01-05 15:34:56 +01:00

2.3 KiB

Raw Blame History

+++ disableToc = false title = "🗣 Text to audio (TTS)" weight = 2 +++

The /tts endpoint can be used to generate speech from text.

Input: input, model

For example, to generate an audio file, you can send a POST request to the /tts endpoint with the instruction as the request body:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "input": "Hello world",
  "model": "tts"
}'

Returns an audio/wav file.

Text-To-Speech Setup

LocalAI supports [bark]({{%relref "model-compatibility/bark" %}}) , piper and vall-e-x:

The piper backend is used for onnx models and requires the modules to be downloaded first.

To install the piper audio models manually:

Download Voices from https://github.com/rhasspy/piper/releases/tag/v0.0.2
Extract the .tar.tgz files (.onnx,.json) inside models
Run the following command to test the model is working

To use the tts endpoint, run the following command. You can specify a backend with the backend parameter. For example, to use the piper backend:

curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
  "model":"it-riccardo_fasol-x-low.onnx",
  "backend": "piper",
  "input": "Ciao, sono Ettore"
}' | aplay

Note:

aplay is a Linux command. You can use other tools to play the audio file.
The model name is the filename with the extension.
The model name is case sensitive.
LocalAI must be compiled with the GO_TAGS=tts flag.

Music

LocalAI also has experimental support for transformers-musicgen for the generation of short musical compositions. Currently, this is implemented via the same requests used for text to speech:

curl --request POST \
  --url http://localhost:8080/tts \
  --header 'Content-Type: application/json' \
  --data '{
    "backend": "transformers-musicgen",
    "model": "facebook/musicgen-medium",
    "input": "Cello Rave"
}' | aplay

Future versions of LocalAI will expose additional control over audio generation beyond the text prompt.

Configuration

Audio models can be configured via YAML files. This allows to configure specific setting for each backend. For instance, backends might be specifying a voice or supports voice cloning which must be specified in the configuration file.

name: tts
backend: vall-e-x
parameters: ...

2.3 KiB Raw Blame History

Text-To-Speech Setup

Music

Configuration

2.3 KiB

Raw Blame History