* readme: update quickstart
* aio(gpu): fix dreamshaper
* tests(aio): allow to run tests also against an endpoint
* docs: split content
* tests: less verbosity
---------
Co-authored-by: Dave <dave@gray101.com>
This change makes the assumption that "Microsoft Corporation Device 008e"
is an NVIDIA CUDA device. If this is not the case, please update the
hardware detection script here.
Signed-off-by: Enrico Ros <enrico.ros@gmail.com>
Co-authored-by: Dave <dave@gray101.com>
* docs(aio): Add AIO images docs
* add image generation link to quickstart
* while reviewing I noticed this one link was missing, so quickly adding it.
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
* docs(mac): Improve documentation for mac build
- added documentation to build from current master
- added troubleshooting information
Signed-off-by: Sebastian <tauven@gmail.com>
* docs(max): fix typo
Signed-off-by: Sebastian <tauven@gmail.com>
---------
Signed-off-by: Sebastian <tauven@gmail.com>
* test with gguf instead of ggml. Updates testPrompt to match? Adds debugging line to Dockerfile that I've found helpful recently.
* fix testPrompt slightly
* Sad Experiment: Test GH runner without metal?
* break apart CGO_LDFLAGS
* switch runner
* upstream llama.cpp disables Metal on Github CI!
* missed a dir from clean-tests
* CGO_LDFLAGS
* tmate failure + NO_ACCELERATE
* whisper.cpp has a metal fix
* do the exact opposite of the name of this branch, but keep it around for unrelated fixes?
* add back newlines
* add tmate to linux for testing
* update fixtures
* timeout for tmate
* fix(go-llama): use llama-cpp as default
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix(backends): drop obsoleted lines
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix: clean up Makefile dependencies to allow for parallel builds
* refactor: remove old unused backend from Makefile
* fix: finish removing legacy backend, update piper
* fix: I broke llama... I fixed llama
* feat: give the tests and builds a few threads
* fix: ensure libraries are replaced before build, add dropreplace target
* Fix image build workflows
Certain engines requires to know during model loading
if the embedding feature has to be enabled, however, it is impractical
to have to set it to ALL the backends that supports embeddings.
There are transformers and sentencentransformers that seamelessly handle
both cases, without having this settings to be explicitly enabled.
The case sussist only for ggml-based models that needs to enable
featuresets during model loading (and thus settings `embedding` is
required), however most of the other engines does not require this.
This change disables the check done at code side, making easier to use
embeddings by not having to specify explicitly `embeddings: true`.
Part of: https://github.com/mudler/LocalAI/issues/1373
* feat(elevenlabs): map elevenlabs API support to TTS
This allows elevenlabs Clients to work automatically with LocalAI by
supporting the elevenlabs API.
The elevenlabs server endpoint is implemented such as it is wired to the
TTS endpoints.
Fixes: https://github.com/mudler/LocalAI/issues/1809
* feat(openai/tts): compat layer with openai tts
Fixes: #1276
* fix: adapt tts CLI
* fixes#1775 and #1774
Add BitsAndBytes Quantization and fixes embedding on CUDA devices
* Manage 4bit and 8 bit quantization
Manage different BitsAndBytes options with the quantization: parameter in yaml
* fix compilation errors on non CUDA environment