Commit Graph

78 Commits

Author SHA1 Message Date
fakezeta
6ef78ef7f6
bugfix: CUDA acceleration not working (#2475)
* bugfix: CUDA acceleration not working

CUDA not working after #2286.
Refactored the code to be more polish

* Update requirements.txt

Missing imports

Signed-off-by: fakezeta <fakezeta@gmail.com>

* Update requirements.txt

Signed-off-by: fakezeta <fakezeta@gmail.com>

---------

Signed-off-by: fakezeta <fakezeta@gmail.com>
2024-06-03 22:41:42 +02:00
fakezeta
4a239a4bff
feat(transformers): various enhancements to the transformers backend (#2468)
update transformers

*Handle Temperature = 0 as greedy search
*Handle custom works as stop words
*Implement KV cache
*Phi 3 no more requires trust_remote_code: true
2024-06-03 08:52:55 +02:00
Chakib Benziane
b99182c8d4
TTS API improvements (#2308)
* update doc on COQUI_LANGUAGE env variable

Signed-off-by: blob42 <contact@blob42.xyz>

* return errors from tts gRPC backend

Signed-off-by: blob42 <contact@blob42.xyz>

* handle speaker_id and language in coqui TTS backend

Signed-off-by: blob42 <contact@blob42.xyz>

* TTS endpoint: add optional language paramter

Signed-off-by: blob42 <contact@blob42.xyz>

* tts fix: empty language string breaks non-multilingual models

Signed-off-by: blob42 <contact@blob42.xyz>

* allow tts param definition in config file

- consolidate TTS options under `tts` config entry

Signed-off-by: blob42 <contact@blob42.xyz>

* tts: update doc

Signed-off-by: blob42 <contact@blob42.xyz>

---------

Signed-off-by: blob42 <contact@blob42.xyz>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-06-01 18:26:27 +00:00
cryptk
ba984c7097
fix: pin version of setuptools for intel builds to work around #2406 (#2414)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-26 18:27:07 +00:00
cryptk
16433d2e8e
fix: install pytorch from proper index for hipblas builds (#2413)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-26 18:05:52 +00:00
Ettore Di Giacinto
1a3dedece0
dependencies(grpcio): bump to fix CI issues (#2362)
feat(grpcio): bump to fix CI issues

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-21 14:33:47 +02:00
Ettore Di Giacinto
8ad669339e
add openvoice backend (#2334)
Wip openvoice
2024-05-19 16:27:08 +02:00
cryptk
86627b27f7
fix: add setuptools to all requirements-intel.txt files for python backends (#2333)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-16 19:15:46 +02:00
fakezeta
5b79bd04a7
add setuptools for openvino (#2301) 2024-05-12 19:31:43 +00:00
cryptk
88942e4761
fix: add missing openvino/optimum/etc libraries for Intel, fixes #2289 (#2292)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-12 09:01:45 +02:00
cryptk
e2de8a88f7
feat: create bash library to handle install/run/test of python backends (#2286)
* feat: create bash library to handle install/run/test of python backends

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* chore: minor cleanup

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: remove incorrect LIMIT_TARGETS from parler-tts

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: update runUnitests to handle running tests from a custom test file

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* chore: document runUnittests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-11 18:32:46 +02:00
cryptk
28a421cb1d
feat: migrate python backends from conda to uv (#2215)
* feat: migrate diffusers backend from conda to uv

  - replace conda with UV for diffusers install (prototype for all
    extras backends)
  - add ability to build docker with one/some/all extras backends
    instead of all or nothing

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate autogtpq bark coqui from conda to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: convert exllama over to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate exllama2 to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate mamba to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate parler to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate petals to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: fix tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate rerankers to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate sentencetransformers to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: install uv for tests-linux

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: make sure file exists before installing on intel images

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate transformers backend to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate transformers-musicgen to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate vall-e-x to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: migrate vllm to uv

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add uv install to the rest of test-extra.yml

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: adjust file perms on all install/run/test scripts

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add missing acclerate dependencies

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add some more missing dependencies to python backends

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: parler tests venv py dir fix

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: correct filename for transformers-musicgen tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: adjust the pwd for valle tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: cleanup and optimization work for uv migration

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add setuptools to requirements-install for mamba

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: more size optimization work

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: make installs and tests more consistent, cleanup some deps

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: cleanup

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: mamba backend is cublas only

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: uncomment lines in makefile

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-10 15:08:08 +02:00
fakezeta
fea9522982
fix: OpenVINO winograd always disabled (#2252)
Winograd convolutions were always disabled giving error when inference device was CPU.
This commit implement logic to disable Winograd convolutions only if CPU or NPU are declared.
2024-05-07 08:38:58 +02:00
fakezeta
4690b534e0
feat: user defined inference device for CUDA and OpenVINO (#2212)
user defined inference device

configuration via main_gpu parameter
2024-05-02 09:54:29 +02:00
cryptk
f7aabf1b50
fix: bring everything onto the same GRPC version to fix tests (#2199)
fix: more places where we are installing grpc that need a version specified
fix: attempt to fix metal tests
fix: metal/brew is forcing an update, they don't have 1.58 available anymore

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-30 19:12:15 +00:00
fakezeta
e38610e521
feat: OpenVINO acceleration for embeddings in transformer backend (#2190)
OpenVINO acceleration for embeddings

New argument type: OVModelForFeatureExtraction
2024-04-30 10:13:04 +02:00
fakezeta
b7ea9602f5
fix: undefined symbol: iJIT_NotifyEvent in import torch ##2153 (#2179)
* add  extra index to Intel repository

* Update install.sh
2024-04-29 15:11:09 +02:00
fakezeta
c9451cb604
Bump oneapi-basekit, optimum and openvino (#2139)
* Bump oneapi-basekit, optimum and openvino

* Changed PERFORMANCE HINT to CUMULATIVE_THROUGHPUT

Minor latency change for first token but about 10-15% speedup on token generation.
2024-04-26 16:20:43 +02:00
Ettore Di Giacinto
b664edde29
feat(rerankers): Add new backend, support jina rerankers API (#2121)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-04-25 00:19:02 +02:00
jtwolfe
2fb34b00b5
Incl ocv pkg for diffsusers utils (#2115)
* Update diffusers.yml

Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com>

* Update diffusers-rocm.yml

Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com>

---------

Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com>
2024-04-24 09:17:49 +02:00
fakezeta
f718a391c0
fix missing TrustRemoteCode in OpenVINO model load (#2114) 2024-04-24 00:45:37 +00:00
fakezeta
8e36fe9b6f
Transformers Backend: max_tokens adherence to OpenAI API (#2108)
max token adherence to OpenAI API

improve adherence to OpenAI API when max tokens is omitted or equal to 0 in the request
2024-04-23 18:42:17 +02:00
fakezeta
66b002458d
Transformer Backend: Implementing use_tokenizer_template and stop_prompts options (#2090)
* fix regression #1971

fixes regression #1971 introduced by intel_extension_for_transformers==1.4

* UseTokenizerTemplate and StopPrompt

Implementation of use_tokenizer_template and stopwords options
2024-04-21 16:20:25 +00:00
Taikono-Himazin
03adc1f60d
Add tensor_parallel_size setting to vllm setting items (#2085)
Signed-off-by: Taikono-Himazin <kazu@po.harenet.ne.jp>
2024-04-20 14:37:02 +00:00
Ettore Di Giacinto
0fdff26924
feat(parler-tts): Add new backend (#2027)
* feat(parler-tts): Add new backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(parler-tts): try downgrade protobuf

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(parler-tts): add parler conda env

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Revert "feat(parler-tts): try downgrade protobuf"

This reverts commit bd5941d5cfc00676b45a99f71debf3c34249cf3c.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* deps: add grpc

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: try to gen proto with same environment

* workaround

* Revert "fix: try to gen proto with same environment"

This reverts commit 998c745e2f.

* Workaround fixup

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
2024-04-13 18:59:21 +02:00
cryptk
1981154f49
fix: dont commit generated files to git (#1993)
* fix: initial work towards not committing generated files to the repository

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* feat: improve build docs

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: remove unused folder from .dockerignore and .gitignore

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: attempt to fix extra backend tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: attempt to fix other tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: more test fixes

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: fix apple tests

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: more extras tests fixes

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add GOBIN to PATH in docker build

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: extra tests and Dockerfile corrections

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: remove build dependency checks

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add golang protobuf compilers to tests-linux action

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: ensure protogen is run for extra backend installs

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: use newer protobuf

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: more missing protoc binaries

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: missing dependencies during docker build

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: don't install grpc compilers in the final stage if they aren't needed

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: python-grpc-tools in 22.04 repos is too old

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: add a couple of extra build dependencies to Makefile

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

* fix: unbreak container rebuild functionality

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>

---------

Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-04-13 09:37:32 +02:00
Ludovic Leroux
12c0d9443e
feat: use tokenizer.apply_chat_template() in vLLM (#1990)
Use tokenizer.apply_chat_template() in vLLM

Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>
2024-04-11 19:20:22 +02:00
Ludovic Leroux
b4548ad72d
feat: add flash-attn in nvidia and rocm envs (#1995)
Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>
2024-04-11 09:44:39 +02:00
Koen Farell
36da11a0ee
deps: Update version of vLLM to add support of Cohere Command_R model in vLLM inference (#1975)
* Update vLLM version to add support of Command_R

Signed-off-by: Koen Farell <hellios.dt@gmail.com>

* fix: Fixed vllm version from requirements

Signed-off-by: Koen Farell <hellios.dt@gmail.com>

* chore: Update transformers-rocm.yml

Signed-off-by: Koen Farell <hellios.dt@gmail.com>

* chore: Update transformers.yml version of vllm

Signed-off-by: Koen Farell <hellios.dt@gmail.com>

---------

Signed-off-by: Koen Farell <hellios.dt@gmail.com>
2024-04-10 11:25:26 +00:00
Sebastian.W
d23e73b118
fix(autogptq): do not use_triton with qwen-vl (#1985)
* Enhance autogptq backend to support VL models

* update dependencies for autogptq

* remove redundant auto-gptq dependency

* Convert base64 to image_url for Qwen-VL model

* implemented model inference for qwen-vl

* remove user prompt from generated answer

* fixed write image error

* fixed use_triton issue when loading Qwen-VL model

---------

Co-authored-by: Binghua Wu <bingwu@estee.com>
2024-04-10 10:36:10 +00:00
fakezeta
a38618db02
fix regression #1971 (#1972)
fixes regression #1971 introduced by intel_extension_for_transformers==1.4
2024-04-08 22:33:51 +02:00
fakezeta
8210ffcb6c
feat: Token Stream support for Transformer, fix: missing package for OpenVINO (#1908)
* Streaming working

* Small fix for regression on CUDA and XPU

* use pip version of optimum[openvino]

* Update backend/python/transformers/transformers_server.py

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Token streaming support

fix optimum[openvino] package in install.sh

* Token Streaming support

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-03-27 17:50:35 +01:00
fakezeta
e7cbe32601
feat: Openvino runtime for transformer backend and streaming support for Openvino and CUDA (#1892)
* fixes #1775 and #1774

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

* Manage 4bit and 8 bit quantization

Manage different BitsAndBytes options with the quantization: parameter in yaml

* fix compilation errors on non CUDA environment

* OpenVINO draft

First draft of OpenVINO integration in transformer backend

* first working implementation

* Streaming working

* Small fix for regression on CUDA and XPU

* use pip version of optimum[openvino]

* Update backend/python/transformers/transformers_server.py

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-03-26 23:31:43 +00:00
Ettore Di Giacinto
607586e0b7
fix: downgrade torch (#1902)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-03-26 22:56:02 +01:00
Sebastian.W
b7ffe66219
Enhance autogptq backend to support VL models (#1860)
* Enhance autogptq backend to support VL models

* update dependencies for autogptq

* remove redundant auto-gptq dependency

* Convert base64 to image_url for Qwen-VL model

* implemented model inference for qwen-vl

* remove user prompt from generated answer

* fixed write image error

---------

Co-authored-by: Binghua Wu <bingwu@estee.com>
2024-03-26 18:48:14 +01:00
Ettore Di Giacinto
20136ca8b7
feat(tts): add Elevenlabs and OpenAI TTS compatibility layer (#1834)
* feat(elevenlabs): map elevenlabs API support to TTS

This allows elevenlabs Clients to work automatically with LocalAI by
supporting the elevenlabs API.

The elevenlabs server endpoint is implemented such as it is wired to the
TTS endpoints.

Fixes: https://github.com/mudler/LocalAI/issues/1809

* feat(openai/tts): compat layer with openai tts

Fixes: #1276

* fix: adapt tts CLI
2024-03-14 23:08:34 +01:00
fakezeta
3882130911
feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 (#1823)
* fixes #1775 and #1774

Add BitsAndBytes Quantization and fixes embedding on CUDA devices

* Manage 4bit and 8 bit quantization

Manage different BitsAndBytes options with the quantization: parameter in yaml

* fix compilation errors on non CUDA environment
2024-03-14 23:06:30 +01:00
Ettore Di Giacinto
5d1018495f
feat(intel): add diffusers/transformers support (#1746)
* feat(intel): add diffusers support

* try to consume upstream container image

* Debug

* Manually install deps

* Map transformers/hf cache dir to modelpath if not specified

* fix(compel): update initialization, pass by all gRPC options

* fix: add dependencies, implement transformers for xpu

* base it from the oneapi image

* Add pillow

* set threads if specified when launching the API

* Skip conda install if intel

* defaults to non-intel

* ci: add to pipelines

* prepare compel only if enabled

* Skip conda install if intel

* fix cleanup

* Disable compel by default

* Install torch 2.1.0 with Intel

* Skip conda on some setups

* Detect python

* Quiet output

* Do not override system python with conda

* Prefer python3

* Fixups

* exllama2: do not install without conda (overrides pytorch version)

* exllama/exllama2: do not install if not using cuda

* Add missing dataset dependency

* Small fixups, symlink to python, add requirements

* Add neural_speed to the deps

* correctly handle model offloading

* fix: device_map == xpu

* go back at calling python, fixed at dockerfile level

* Exllama2 restricted to only nvidia gpus

* Tokenizer to xpu
2024-03-07 14:37:45 +01:00
Dave
5c69dd155f
feat(autogpt/transformers): consume trust_remote_code (#1799)
trusting remote code by default is a danger to our users
2024-03-05 19:47:15 +01:00
TwinFin
504f2e8bf4
Update Backend Dependancies (#1797)
* Update transformers.yml

Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>

* Update transformers-rocm.yml

Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>

* Update transformers-nvidia.yml

Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>

---------

Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>
2024-03-05 10:10:00 +00:00
Ludovic Leroux
939411300a
Bump vLLM version + more options when loading models in vLLM (#1782)
* Bump vLLM version to 0.3.2

* Add vLLM model loading options

* Remove transformers-exllama

* Fix install exllama
2024-03-01 22:48:53 +01:00
Ludovic Leroux
0135e1e3b9
fix: vllm - use AsyncLLMEngine to allow true streaming mode (#1749)
* fix: use vllm AsyncLLMEngine to bring true stream

Current vLLM implementation uses the LLMEngine, which was designed for offline batch inference, which results in the streaming mode outputing all blobs at once at the end of the inference.

This PR reworks the gRPC server to use asyncio and gRPC.aio, in combination with vLLM's AsyncLLMEngine to bring true stream mode.

This PR also passes more parameters to vLLM during inference (presence_penalty, frequency_penalty, stop, ignore_eos, seed, ...).

* Remove unused import
2024-02-24 11:48:45 +01:00
Chakib Benziane
594eb468df
Add TTS dependency for cuda based builds fixes #1727 (#1730)
Signed-off-by: Chakib Benziane <contact@blob42.xyz>
2024-02-20 21:59:43 +01:00
fenfir
fb0a4c5d9a
Build docker container for ROCm (#1595)
* Dockerfile changes to build for ROCm

* Adjust linker flags for ROCm

* Update conda env for diffusers and transformers to use ROCm pytorch

* Update transformers conda env for ROCm

* ci: build hipblas images

* fixup rebase

* use self-hosted

Signed-off-by: mudler <mudler@localai.io>

* specify LD_LIBRARY_PATH only when BUILD_TYPE=hipblas

---------

Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: mudler <mudler@localai.io>
2024-02-16 15:08:50 +01:00
Ettore Di Giacinto
5e155fb081
fix(python): pin exllama2 (#1711)
fix(python): pin python deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-02-14 21:44:12 +01:00
Ettore Di Giacinto
fd68bf7084
fix(vall-e-x): Fix voice cloning (#1696) 2024-02-11 11:20:00 +01:00
Ettore Di Giacinto
53dbe36f32
feat(tts): respect YAMLs config file, add sycl docs/examples (#1692)
* feat(refactor): refactor config and input reading

* feat(tts): read config file for TTS

* examples(kubernetes): Add simple deployment example

* examples(kubernetes): Add simple deployment for intel arc

* docs(sycl): add sycl example

* feat(tts): do not always pick a first model

* fixups to run vall-e-x on container

* Correctly resolve backend
2024-02-10 21:37:03 +01:00
Ettore Di Giacinto
cb7512734d
transformers: correctly load automodels (#1643)
* backends(transformers): use AutoModel with LLM types

* examples: animagine-xl

* Add codellama examples
2024-01-26 00:13:21 +01:00
Ettore Di Giacinto
5e335eaead
feat(transformers): support also text generation (#1630)
* feat(transformers): support also text generation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* embedded: set seed -1

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-01-23 23:07:31 +01:00
Ettore Di Giacinto
06cd9ef98d
feat(extra-backends): Improvements, adding mamba example (#1618)
* feat(extra-backends): Improvements

vllm: add max_tokens, wire up stream event
mamba: fixups, adding examples for mamba-chat

* examples(mamba-chat): add

* docs: update
2024-01-20 17:56:08 +01:00