LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2024-06-07 19:40:48 +00:00

Author	SHA1	Message	Date
fakezeta	c9451cb604	Bump oneapi-basekit, optimum and openvino (#2139 ) * Bump oneapi-basekit, optimum and openvino * Changed PERFORMANCE HINT to CUMULATIVE_THROUGHPUT Minor latency change for first token but about 10-15% speedup on token generation.	2024-04-26 16:20:43 +02:00
Ettore Di Giacinto	b664edde29	feat(rerankers): Add new backend, support jina rerankers API (#2121 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-04-25 00:19:02 +02:00
Ludovic Leroux	b4548ad72d	feat: add flash-attn in nvidia and rocm envs (#1995 ) Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>	2024-04-11 09:44:39 +02:00
Koen Farell	36da11a0ee	deps: Update version of vLLM to add support of Cohere Command_R model in vLLM inference (#1975 ) * Update vLLM version to add support of Command_R Signed-off-by: Koen Farell <hellios.dt@gmail.com> * fix: Fixed vllm version from requirements Signed-off-by: Koen Farell <hellios.dt@gmail.com> * chore: Update transformers-rocm.yml Signed-off-by: Koen Farell <hellios.dt@gmail.com> * chore: Update transformers.yml version of vllm Signed-off-by: Koen Farell <hellios.dt@gmail.com> --------- Signed-off-by: Koen Farell <hellios.dt@gmail.com>	2024-04-10 11:25:26 +00:00
fakezeta	8210ffcb6c	feat: Token Stream support for Transformer, fix: missing package for OpenVINO (#1908 ) * Streaming working * Small fix for regression on CUDA and XPU * use pip version of optimum[openvino] * Update backend/python/transformers/transformers_server.py Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Token streaming support fix optimum[openvino] package in install.sh * Token Streaming support --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-27 17:50:35 +01:00
fakezeta	e7cbe32601	feat: Openvino runtime for transformer backend and streaming support for Openvino and CUDA (#1892 ) * fixes #1775 and #1774 Add BitsAndBytes Quantization and fixes embedding on CUDA devices * Manage 4bit and 8 bit quantization Manage different BitsAndBytes options with the quantization: parameter in yaml * fix compilation errors on non CUDA environment * OpenVINO draft First draft of OpenVINO integration in transformer backend * first working implementation * Streaming working * Small fix for regression on CUDA and XPU * use pip version of optimum[openvino] * Update backend/python/transformers/transformers_server.py Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-26 23:31:43 +00:00
Ettore Di Giacinto	607586e0b7	fix: downgrade torch (#1902 ) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-26 22:56:02 +01:00
Sebastian.W	b7ffe66219	Enhance autogptq backend to support VL models (#1860 ) * Enhance autogptq backend to support VL models * update dependencies for autogptq * remove redundant auto-gptq dependency * Convert base64 to image_url for Qwen-VL model * implemented model inference for qwen-vl * remove user prompt from generated answer * fixed write image error --------- Co-authored-by: Binghua Wu <bingwu@estee.com>	2024-03-26 18:48:14 +01:00
fakezeta	3882130911	feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 (#1823 ) * fixes #1775 and #1774 Add BitsAndBytes Quantization and fixes embedding on CUDA devices * Manage 4bit and 8 bit quantization Manage different BitsAndBytes options with the quantization: parameter in yaml * fix compilation errors on non CUDA environment	2024-03-14 23:06:30 +01:00
Ettore Di Giacinto	5d1018495f	feat(intel): add diffusers/transformers support (#1746 ) * feat(intel): add diffusers support * try to consume upstream container image * Debug * Manually install deps * Map transformers/hf cache dir to modelpath if not specified * fix(compel): update initialization, pass by all gRPC options * fix: add dependencies, implement transformers for xpu * base it from the oneapi image * Add pillow * set threads if specified when launching the API * Skip conda install if intel * defaults to non-intel * ci: add to pipelines * prepare compel only if enabled * Skip conda install if intel * fix cleanup * Disable compel by default * Install torch 2.1.0 with Intel * Skip conda on some setups * Detect python * Quiet output * Do not override system python with conda * Prefer python3 * Fixups * exllama2: do not install without conda (overrides pytorch version) * exllama/exllama2: do not install if not using cuda * Add missing dataset dependency * Small fixups, symlink to python, add requirements * Add neural_speed to the deps * correctly handle model offloading * fix: device_map == xpu * go back at calling python, fixed at dockerfile level * Exllama2 restricted to only nvidia gpus * Tokenizer to xpu	2024-03-07 14:37:45 +01:00
TwinFin	504f2e8bf4	Update Backend Dependancies (#1797 ) * Update transformers.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> * Update transformers-rocm.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> * Update transformers-nvidia.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> --------- Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>	2024-03-05 10:10:00 +00:00
Ludovic Leroux	939411300a	Bump vLLM version + more options when loading models in vLLM (#1782 ) * Bump vLLM version to 0.3.2 * Add vLLM model loading options * Remove transformers-exllama * Fix install exllama	2024-03-01 22:48:53 +01:00
Chakib Benziane	594eb468df	Add TTS dependency for cuda based builds fixes #1727 (#1730 ) Signed-off-by: Chakib Benziane <contact@blob42.xyz>	2024-02-20 21:59:43 +01:00
fenfir	fb0a4c5d9a	Build docker container for ROCm (#1595 ) * Dockerfile changes to build for ROCm * Adjust linker flags for ROCm * Update conda env for diffusers and transformers to use ROCm pytorch * Update transformers conda env for ROCm * ci: build hipblas images * fixup rebase * use self-hosted Signed-off-by: mudler <mudler@localai.io> * specify LD_LIBRARY_PATH only when BUILD_TYPE=hipblas --------- Signed-off-by: mudler <mudler@localai.io> Co-authored-by: mudler <mudler@localai.io>	2024-02-16 15:08:50 +01:00
Ettore Di Giacinto	e19d7226f8	feat: more embedded models, coqui fixes, add model usage and description (#1556 ) * feat: add model descriptions and usage * remove default model gallery * models: add embeddings and tts * docs: update table * docs: updates * images: cleanup pip cache after install * images: always run apt-get clean * ux: improve gRPC connection errors * ux: improve some messages * fix: fix coqui when no AudioPath is passed by * embedded: add more models * Add usage * Reorder table	2024-01-08 00:37:02 +01:00
Ettore Di Giacinto	62a02cd1fe	deps(conda): use transformers environment with autogptq (#1555 )	2024-01-06 15:30:53 +01:00
Ettore Di Giacinto	949da7792d	deps(conda): use transformers-env with vllm,exllama(2) (#1554 ) * deps(conda): use transformers with vllm * join vllm, exllama, exllama2, split petals	2024-01-06 13:32:28 +01:00
Ettore Di Giacinto	95eb72bfd3	feat: add 🐸 coqui (#1489 ) * feat: add coqui * docs: update news	2023-12-24 19:38:54 +01:00
Ettore Di Giacinto	939187a129	env(conda): use transformers for vall-e-x (#1481 )	2023-12-23 14:31:34 -05:00
Ettore Di Giacinto	b4b21a446b	feat(conda): share envs with transformer-based backends (#1465 ) * feat(conda): share env between diffusers and bark * Detect if env already exists * share diffusers and petals * tests: add petals * Use smaller model for tests with petals * test only model load on petals * tests(petals): run only load model tests * Revert "test only model load on petals" This reverts commit `111cfa97f1`. * move transformers and sentencetransformers to common env * Share also transformers-musicgen	2023-12-21 08:35:15 +01:00

20 Commits