* docs(mac): Improve documentation for mac build
- added documentation to build from current master
- added troubleshooting information
Signed-off-by: Sebastian <tauven@gmail.com>
* docs(max): fix typo
Signed-off-by: Sebastian <tauven@gmail.com>
---------
Signed-off-by: Sebastian <tauven@gmail.com>
* test with gguf instead of ggml. Updates testPrompt to match? Adds debugging line to Dockerfile that I've found helpful recently.
* fix testPrompt slightly
* Sad Experiment: Test GH runner without metal?
* break apart CGO_LDFLAGS
* switch runner
* upstream llama.cpp disables Metal on Github CI!
* missed a dir from clean-tests
* CGO_LDFLAGS
* tmate failure + NO_ACCELERATE
* whisper.cpp has a metal fix
* do the exact opposite of the name of this branch, but keep it around for unrelated fixes?
* add back newlines
* add tmate to linux for testing
* update fixtures
* timeout for tmate
* fix(go-llama): use llama-cpp as default
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix(backends): drop obsoleted lines
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix: clean up Makefile dependencies to allow for parallel builds
* refactor: remove old unused backend from Makefile
* fix: finish removing legacy backend, update piper
* fix: I broke llama... I fixed llama
* feat: give the tests and builds a few threads
* fix: ensure libraries are replaced before build, add dropreplace target
* Fix image build workflows
Certain engines requires to know during model loading
if the embedding feature has to be enabled, however, it is impractical
to have to set it to ALL the backends that supports embeddings.
There are transformers and sentencentransformers that seamelessly handle
both cases, without having this settings to be explicitly enabled.
The case sussist only for ggml-based models that needs to enable
featuresets during model loading (and thus settings `embedding` is
required), however most of the other engines does not require this.
This change disables the check done at code side, making easier to use
embeddings by not having to specify explicitly `embeddings: true`.
Part of: https://github.com/mudler/LocalAI/issues/1373
* feat(elevenlabs): map elevenlabs API support to TTS
This allows elevenlabs Clients to work automatically with LocalAI by
supporting the elevenlabs API.
The elevenlabs server endpoint is implemented such as it is wired to the
TTS endpoints.
Fixes: https://github.com/mudler/LocalAI/issues/1809
* feat(openai/tts): compat layer with openai tts
Fixes: #1276
* fix: adapt tts CLI
* fixes#1775 and #1774
Add BitsAndBytes Quantization and fixes embedding on CUDA devices
* Manage 4bit and 8 bit quantization
Manage different BitsAndBytes options with the quantization: parameter in yaml
* fix compilation errors on non CUDA environment
* fix(defaults): set better defaults for inferencing
This changeset aim to have better defaults and to properly detect when
no inference settings are provided with the model.
If not specified, we defaults to mirostat sampling, and offload all the
GPU layers (if a GPU is detected).
Related to https://github.com/mudler/LocalAI/issues/1373 and https://github.com/mudler/LocalAI/issues/1723
* Adapt tests
* Also pre-initialize default seed
The default sampler on some models don't return enough candidates which
leads to a false sense of randomness. Tracing back the code it looks
that with the temperature sampler there might not be enough
candidates to pick from, and since the seed and "randomness" take effect
while picking a good candidate this yields to the same results over and
over.
Fixes https://github.com/mudler/LocalAI/issues/1723 by updating the
examples and documentation to use mirostat instead.