LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2024-06-07 19:40:48 +00:00

Author	SHA1	Message	Date
fakezeta	fea9522982	fix: OpenVINO winograd always disabled (#2252 ) Winograd convolutions were always disabled giving error when inference device was CPU. This commit implement logic to disable Winograd convolutions only if CPU or NPU are declared.	2024-05-07 08:38:58 +02:00
fakezeta	4690b534e0	feat: user defined inference device for CUDA and OpenVINO (#2212 ) user defined inference device configuration via main_gpu parameter	2024-05-02 09:54:29 +02:00
cryptk	f7aabf1b50	fix: bring everything onto the same GRPC version to fix tests (#2199 ) fix: more places where we are installing grpc that need a version specified fix: attempt to fix metal tests fix: metal/brew is forcing an update, they don't have 1.58 available anymore Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-04-30 19:12:15 +00:00
fakezeta	e38610e521	feat: OpenVINO acceleration for embeddings in transformer backend (#2190 ) OpenVINO acceleration for embeddings New argument type: OVModelForFeatureExtraction	2024-04-30 10:13:04 +02:00
fakezeta	b7ea9602f5	fix: undefined symbol: iJIT_NotifyEvent in import torch ##2153 (#2179 ) * add extra index to Intel repository * Update install.sh	2024-04-29 15:11:09 +02:00
fakezeta	c9451cb604	Bump oneapi-basekit, optimum and openvino (#2139 ) * Bump oneapi-basekit, optimum and openvino * Changed PERFORMANCE HINT to CUMULATIVE_THROUGHPUT Minor latency change for first token but about 10-15% speedup on token generation.	2024-04-26 16:20:43 +02:00
Ettore Di Giacinto	b664edde29	feat(rerankers): Add new backend, support jina rerankers API (#2121 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-04-25 00:19:02 +02:00
jtwolfe	2fb34b00b5	Incl ocv pkg for diffsusers utils (#2115 ) * Update diffusers.yml Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com> * Update diffusers-rocm.yml Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com> --------- Signed-off-by: jtwolfe <jamie.t.wolfe@gmail.com>	2024-04-24 09:17:49 +02:00
fakezeta	f718a391c0	fix missing TrustRemoteCode in OpenVINO model load (#2114 )	2024-04-24 00:45:37 +00:00
fakezeta	8e36fe9b6f	Transformers Backend: max_tokens adherence to OpenAI API (#2108 ) max token adherence to OpenAI API improve adherence to OpenAI API when max tokens is omitted or equal to 0 in the request	2024-04-23 18:42:17 +02:00
fakezeta	66b002458d	Transformer Backend: Implementing use_tokenizer_template and stop_prompts options (#2090 ) * fix regression #1971 fixes regression #1971 introduced by intel_extension_for_transformers==1.4 * UseTokenizerTemplate and StopPrompt Implementation of use_tokenizer_template and stopwords options	2024-04-21 16:20:25 +00:00
Taikono-Himazin	03adc1f60d	Add tensor_parallel_size setting to vllm setting items (#2085 ) Signed-off-by: Taikono-Himazin <kazu@po.harenet.ne.jp>	2024-04-20 14:37:02 +00:00
Ettore Di Giacinto	0fdff26924	feat(parler-tts): Add new backend (#2027 ) * feat(parler-tts): Add new backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parler-tts): try downgrade protobuf Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parler-tts): add parler conda env Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Revert "feat(parler-tts): try downgrade protobuf" This reverts commit bd5941d5cfc00676b45a99f71debf3c34249cf3c. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * deps: add grpc Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: try to gen proto with same environment * workaround * Revert "fix: try to gen proto with same environment" This reverts commit `998c745e2f`. * Workaround fixup --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Dave <dave@gray101.com>	2024-04-13 18:59:21 +02:00
cryptk	1981154f49	fix: dont commit generated files to git (#1993 ) * fix: initial work towards not committing generated files to the repository Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * feat: improve build docs Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: remove unused folder from .dockerignore and .gitignore Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: attempt to fix extra backend tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: attempt to fix other tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: more test fixes Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: fix apple tests Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: more extras tests fixes Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add GOBIN to PATH in docker build Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: extra tests and Dockerfile corrections Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: remove build dependency checks Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add golang protobuf compilers to tests-linux action Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: ensure protogen is run for extra backend installs Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: use newer protobuf Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: more missing protoc binaries Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: missing dependencies during docker build Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: don't install grpc compilers in the final stage if they aren't needed Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: python-grpc-tools in 22.04 repos is too old Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add a couple of extra build dependencies to Makefile Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: unbreak container rebuild functionality Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> --------- Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-04-13 09:37:32 +02:00
Ludovic Leroux	12c0d9443e	feat: use tokenizer.apply_chat_template() in vLLM (#1990 ) Use tokenizer.apply_chat_template() in vLLM Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>	2024-04-11 19:20:22 +02:00
Ludovic Leroux	b4548ad72d	feat: add flash-attn in nvidia and rocm envs (#1995 ) Signed-off-by: Ludovic LEROUX <ludovic@inpher.io>	2024-04-11 09:44:39 +02:00
Koen Farell	36da11a0ee	deps: Update version of vLLM to add support of Cohere Command_R model in vLLM inference (#1975 ) * Update vLLM version to add support of Command_R Signed-off-by: Koen Farell <hellios.dt@gmail.com> * fix: Fixed vllm version from requirements Signed-off-by: Koen Farell <hellios.dt@gmail.com> * chore: Update transformers-rocm.yml Signed-off-by: Koen Farell <hellios.dt@gmail.com> * chore: Update transformers.yml version of vllm Signed-off-by: Koen Farell <hellios.dt@gmail.com> --------- Signed-off-by: Koen Farell <hellios.dt@gmail.com>	2024-04-10 11:25:26 +00:00
Sebastian.W	d23e73b118	fix(autogptq): do not use_triton with qwen-vl (#1985 ) * Enhance autogptq backend to support VL models * update dependencies for autogptq * remove redundant auto-gptq dependency * Convert base64 to image_url for Qwen-VL model * implemented model inference for qwen-vl * remove user prompt from generated answer * fixed write image error * fixed use_triton issue when loading Qwen-VL model --------- Co-authored-by: Binghua Wu <bingwu@estee.com>	2024-04-10 10:36:10 +00:00
fakezeta	a38618db02	fix regression #1971 (#1972 ) fixes regression #1971 introduced by intel_extension_for_transformers==1.4	2024-04-08 22:33:51 +02:00
fakezeta	8210ffcb6c	feat: Token Stream support for Transformer, fix: missing package for OpenVINO (#1908 ) * Streaming working * Small fix for regression on CUDA and XPU * use pip version of optimum[openvino] * Update backend/python/transformers/transformers_server.py Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Token streaming support fix optimum[openvino] package in install.sh * Token Streaming support --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-27 17:50:35 +01:00
fakezeta	e7cbe32601	feat: Openvino runtime for transformer backend and streaming support for Openvino and CUDA (#1892 ) * fixes #1775 and #1774 Add BitsAndBytes Quantization and fixes embedding on CUDA devices * Manage 4bit and 8 bit quantization Manage different BitsAndBytes options with the quantization: parameter in yaml * fix compilation errors on non CUDA environment * OpenVINO draft First draft of OpenVINO integration in transformer backend * first working implementation * Streaming working * Small fix for regression on CUDA and XPU * use pip version of optimum[openvino] * Update backend/python/transformers/transformers_server.py Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-26 23:31:43 +00:00
Ettore Di Giacinto	607586e0b7	fix: downgrade torch (#1902 ) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-26 22:56:02 +01:00
Sebastian.W	b7ffe66219	Enhance autogptq backend to support VL models (#1860 ) * Enhance autogptq backend to support VL models * update dependencies for autogptq * remove redundant auto-gptq dependency * Convert base64 to image_url for Qwen-VL model * implemented model inference for qwen-vl * remove user prompt from generated answer * fixed write image error --------- Co-authored-by: Binghua Wu <bingwu@estee.com>	2024-03-26 18:48:14 +01:00
Ettore Di Giacinto	20136ca8b7	feat(tts): add Elevenlabs and OpenAI TTS compatibility layer (#1834 ) * feat(elevenlabs): map elevenlabs API support to TTS This allows elevenlabs Clients to work automatically with LocalAI by supporting the elevenlabs API. The elevenlabs server endpoint is implemented such as it is wired to the TTS endpoints. Fixes: https://github.com/mudler/LocalAI/issues/1809 * feat(openai/tts): compat layer with openai tts Fixes: #1276 * fix: adapt tts CLI	2024-03-14 23:08:34 +01:00
fakezeta	3882130911	feat: Add Bitsandbytes quantization for transformer backend enhancement #1775 and fix: Transformer backend error on CUDA #1774 (#1823 ) * fixes #1775 and #1774 Add BitsAndBytes Quantization and fixes embedding on CUDA devices * Manage 4bit and 8 bit quantization Manage different BitsAndBytes options with the quantization: parameter in yaml * fix compilation errors on non CUDA environment	2024-03-14 23:06:30 +01:00
Ettore Di Giacinto	5d1018495f	feat(intel): add diffusers/transformers support (#1746 ) * feat(intel): add diffusers support * try to consume upstream container image * Debug * Manually install deps * Map transformers/hf cache dir to modelpath if not specified * fix(compel): update initialization, pass by all gRPC options * fix: add dependencies, implement transformers for xpu * base it from the oneapi image * Add pillow * set threads if specified when launching the API * Skip conda install if intel * defaults to non-intel * ci: add to pipelines * prepare compel only if enabled * Skip conda install if intel * fix cleanup * Disable compel by default * Install torch 2.1.0 with Intel * Skip conda on some setups * Detect python * Quiet output * Do not override system python with conda * Prefer python3 * Fixups * exllama2: do not install without conda (overrides pytorch version) * exllama/exllama2: do not install if not using cuda * Add missing dataset dependency * Small fixups, symlink to python, add requirements * Add neural_speed to the deps * correctly handle model offloading * fix: device_map == xpu * go back at calling python, fixed at dockerfile level * Exllama2 restricted to only nvidia gpus * Tokenizer to xpu	2024-03-07 14:37:45 +01:00
Dave	5c69dd155f	feat(autogpt/transformers): consume `trust_remote_code` (#1799 ) trusting remote code by default is a danger to our users	2024-03-05 19:47:15 +01:00
TwinFin	504f2e8bf4	Update Backend Dependancies (#1797 ) * Update transformers.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> * Update transformers-rocm.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> * Update transformers-nvidia.yml Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com> --------- Signed-off-by: TwinFin <57421631+TwinFinz@users.noreply.github.com>	2024-03-05 10:10:00 +00:00
Ludovic Leroux	939411300a	Bump vLLM version + more options when loading models in vLLM (#1782 ) * Bump vLLM version to 0.3.2 * Add vLLM model loading options * Remove transformers-exllama * Fix install exllama	2024-03-01 22:48:53 +01:00
Ludovic Leroux	0135e1e3b9	fix: vllm - use AsyncLLMEngine to allow true streaming mode (#1749 ) * fix: use vllm AsyncLLMEngine to bring true stream Current vLLM implementation uses the LLMEngine, which was designed for offline batch inference, which results in the streaming mode outputing all blobs at once at the end of the inference. This PR reworks the gRPC server to use asyncio and gRPC.aio, in combination with vLLM's AsyncLLMEngine to bring true stream mode. This PR also passes more parameters to vLLM during inference (presence_penalty, frequency_penalty, stop, ignore_eos, seed, ...). * Remove unused import	2024-02-24 11:48:45 +01:00
Chakib Benziane	594eb468df	Add TTS dependency for cuda based builds fixes #1727 (#1730 ) Signed-off-by: Chakib Benziane <contact@blob42.xyz>	2024-02-20 21:59:43 +01:00
fenfir	fb0a4c5d9a	Build docker container for ROCm (#1595 ) * Dockerfile changes to build for ROCm * Adjust linker flags for ROCm * Update conda env for diffusers and transformers to use ROCm pytorch * Update transformers conda env for ROCm * ci: build hipblas images * fixup rebase * use self-hosted Signed-off-by: mudler <mudler@localai.io> * specify LD_LIBRARY_PATH only when BUILD_TYPE=hipblas --------- Signed-off-by: mudler <mudler@localai.io> Co-authored-by: mudler <mudler@localai.io>	2024-02-16 15:08:50 +01:00
Ettore Di Giacinto	5e155fb081	fix(python): pin exllama2 (#1711 ) fix(python): pin python deps Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-14 21:44:12 +01:00
Ettore Di Giacinto	fd68bf7084	fix(vall-e-x): Fix voice cloning (#1696 )	2024-02-11 11:20:00 +01:00
Ettore Di Giacinto	53dbe36f32	feat(tts): respect YAMLs config file, add sycl docs/examples (#1692 ) * feat(refactor): refactor config and input reading * feat(tts): read config file for TTS * examples(kubernetes): Add simple deployment example * examples(kubernetes): Add simple deployment for intel arc * docs(sycl): add sycl example * feat(tts): do not always pick a first model * fixups to run vall-e-x on container * Correctly resolve backend	2024-02-10 21:37:03 +01:00
Ettore Di Giacinto	cb7512734d	transformers: correctly load automodels (#1643 ) * backends(transformers): use AutoModel with LLM types * examples: animagine-xl * Add codellama examples	2024-01-26 00:13:21 +01:00
Ettore Di Giacinto	5e335eaead	feat(transformers): support also text generation (#1630 ) * feat(transformers): support also text generation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * embedded: set seed -1 --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-01-23 23:07:31 +01:00
Ettore Di Giacinto	06cd9ef98d	feat(extra-backends): Improvements, adding mamba example (#1618 ) * feat(extra-backends): Improvements vllm: add max_tokens, wire up stream event mamba: fixups, adding examples for mamba-chat * examples(mamba-chat): add * docs: update	2024-01-20 17:56:08 +01:00
Ettore Di Giacinto	9e653d6abe	feat: 🐍 add mamba support (#1589 ) feat(mamba): Initial import This is a first iteration of the mamba backend, loosely based on mamba-chat(https://github.com/havenhq/mamba-chat).	2024-01-19 23:42:50 +01:00
Ettore Di Giacinto	e19d7226f8	feat: more embedded models, coqui fixes, add model usage and description (#1556 ) * feat: add model descriptions and usage * remove default model gallery * models: add embeddings and tts * docs: update table * docs: updates * images: cleanup pip cache after install * images: always run apt-get clean * ux: improve gRPC connection errors * ux: improve some messages * fix: fix coqui when no AudioPath is passed by * embedded: add more models * Add usage * Reorder table	2024-01-08 00:37:02 +01:00
Ettore Di Giacinto	62a02cd1fe	deps(conda): use transformers environment with autogptq (#1555 )	2024-01-06 15:30:53 +01:00
Ettore Di Giacinto	949da7792d	deps(conda): use transformers-env with vllm,exllama(2) (#1554 ) * deps(conda): use transformers with vllm * join vllm, exllama, exllama2, split petals	2024-01-06 13:32:28 +01:00
Ettore Di Giacinto	583bd28a5c	fix(diffusers): add omegaconf dependency (#1540 ) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-01-04 00:06:41 +01:00
Ettore Di Giacinto	a1aa6cb7c2	fix(entrypoint): cd to backend dir before start (#1530 ) Certain backends as vall-e-x are not meant to be used as a library, so we want to start the process in the same folder where the backend and all the assets are fixes #1394	2024-01-01 22:02:48 +01:00
Ettore Di Giacinto	95eb72bfd3	feat: add 🐸 coqui (#1489 ) * feat: add coqui * docs: update news	2023-12-24 19:38:54 +01:00
BobMaster	7e2d101a46	fix: guidance_scale not work in sd (#1488 ) Signed-off-by: hibobmaster <32976627+hibobmaster@users.noreply.github.com>	2023-12-24 19:24:52 +01:00
Sertaç Özercan	6597881854	fix: exllama2 backend (#1484 ) Signed-off-by: Sertac Ozercan <sozercan@gmail.com>	2023-12-24 08:32:12 +00:00
Ettore Di Giacinto	939187a129	env(conda): use transformers for vall-e-x (#1481 )	2023-12-23 14:31:34 -05:00
Ettore Di Giacinto	b4b21a446b	feat(conda): share envs with transformer-based backends (#1465 ) * feat(conda): share env between diffusers and bark * Detect if env already exists * share diffusers and petals * tests: add petals * Use smaller model for tests with petals * test only model load on petals * tests(petals): run only load model tests * Revert "test only model load on petals" This reverts commit `111cfa97f1`. * move transformers and sentencetransformers to common env * Share also transformers-musicgen	2023-12-21 08:35:15 +01:00
Ettore Di Giacinto	dd982acf2c	feat(img2vid,txt2vid): Initial support for img2vid,txt2vid (#1442 ) * feat(img2vid): Initial support for img2vid * doc(SD): fix SDXL Example * Minor fixups for img2vid * docs(img2img): fix example curl call * feat(txt2vid): initial support Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * diffusers: be retro-compatible with CUDA settings * docs(img2vid, txt2vid): examples * Add notice on docs --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2023-12-15 18:06:20 -05:00

1 2

66 Commits