LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2024-06-07 19:40:48 +00:00

Author	SHA1	Message	Date
Ettore Di Giacinto	b2772509b4	models(llama3): add llama3 to embedded models (#2074 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-04-19 18:23:44 +02:00
Ettore Di Giacinto	27ec84827c	refactor(template): isolate and add tests (#2069 ) * refactor(template): isolate and add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Dave <dave@gray101.com> Co-authored-by: Dave <dave@gray101.com>	2024-04-19 02:40:18 +00:00
Ettore Di Giacinto	bbea62b907	feat(functions): support models with no grammar, add tests (#2068 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-04-18 22:43:12 +02:00
Ettore Di Giacinto	f9c75d4878	tests: add template tests (#2063 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-04-18 10:57:24 +02:00
Ettore Di Giacinto	af9e5a2d05	Revert #1963 (#2056 ) * Revert "fix(fncall): fix regression introduced in #1963 (#2048)" This reverts commit `6b06d4e0af`. * Revert "fix: action-tmate back to upstream, dead code removal (#2038)" This reverts commit `fdec8a9d00`. * Revert "feat(grpc): return consumed token count and update response accordingly (#2035)" This reverts commit `e843d7df0e`. * Revert "refactor: backend/service split, channel-based llm flow (#1963)" This reverts commit `eed5706994`. * feat(grpc): return consumed token count and update response accordingly Fixes: #1920 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-04-17 23:33:49 +02:00
Dave	eed5706994	refactor: backend/service split, channel-based llm flow (#1963 ) Refactor: channel based llm flow and services split --------- Signed-off-by: Dave Lee <dave@gray101.com>	2024-04-13 09:45:34 +02:00
cryptk	b85dad0286	feat: first pass at improving logging (#1956 ) Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-04-04 09:24:22 +02:00
Ettore Di Giacinto	bd25d8049c	fix(watchdog): use ShutdownModel instead of StopModel (#1882 ) Fixes #1760	2024-03-23 16:19:57 +01:00
Richard Palethorpe	643d85d2cc	feat(stores): Vector store backend (#1795 ) Add simple vector store backend Signed-off-by: Richard Palethorpe <io@richiejp.com>	2024-03-22 21:14:04 +01:00
Ettore Di Giacinto	e533dcf506	feat(functions/aio): all-in-one images, function template enhancements (#1862 ) * feat(startup): allow to specify models from local files * feat(aio): add Dockerfile, make targets, aio profiles * feat(template): add Function and LastMessage * add hermes2-pro-mistral * update hermes2 definition * feat(template): add sprig * feat(template): expose FunctionCall * feat(aio): switch llm for text	2024-03-21 01:12:20 +01:00
Ettore Di Giacinto	88b65f63d0	fix(go-llama): use llama-cpp as default (#1849 ) * fix(go-llama): use llama-cpp as default Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * fix(backends): drop obsoleted lines --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-03-17 23:08:22 +01:00
Ettore Di Giacinto	5d1018495f	feat(intel): add diffusers/transformers support (#1746 ) * feat(intel): add diffusers support * try to consume upstream container image * Debug * Manually install deps * Map transformers/hf cache dir to modelpath if not specified * fix(compel): update initialization, pass by all gRPC options * fix: add dependencies, implement transformers for xpu * base it from the oneapi image * Add pillow * set threads if specified when launching the API * Skip conda install if intel * defaults to non-intel * ci: add to pipelines * prepare compel only if enabled * Skip conda install if intel * fix cleanup * Disable compel by default * Install torch 2.1.0 with Intel * Skip conda on some setups * Detect python * Quiet output * Do not override system python with conda * Prefer python3 * Fixups * exllama2: do not install without conda (overrides pytorch version) * exllama/exllama2: do not install if not using cuda * Add missing dataset dependency * Small fixups, symlink to python, add requirements * Add neural_speed to the deps * correctly handle model offloading * fix: device_map == xpu * go back at calling python, fixed at dockerfile level * Exllama2 restricted to only nvidia gpus * Tokenizer to xpu	2024-03-07 14:37:45 +01:00
Ettore Di Giacinto	c72808f18b	feat(tools): support Tool calls in the API (#1715 ) * feat(tools): support Tools in the API Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com> * feat(tools): support function streaming * Adhere to new return types when using tools instead of functions * Keep backward compatibility with function calling * Evaluate function names in chat templates * Disable recovery with --debug * Correctly stream out the entire result * Detect when llm chooses to reply and to not perform any action in SSE * Feedback from code review --------- Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>	2024-02-17 10:00:34 +01:00
Ettore Di Giacinto	ddd21f1644	feat: Use ubuntu as base for container images, drop deprecated ggml-transformers backends (#1689 ) * cleanup backends * switch image to ubuntu 22.04 * adapt commands for ubuntu * transformers cleanup * no contrib on ubuntu * Change test model to gguf * ci: disable bark tests (too cpu-intensive) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * cleanup * refinements * use intel base image * Makefile: Add docker targets * Change test model --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-02-08 20:12:51 +01:00
Ettore Di Giacinto	98ad93d53e	Drop ggml-based gpt2 and starcoder (supported by llama.cpp) (#1679 ) * Drop ggml-based gpt2 and starcoder (supported by llama.cpp) * Update compatibility table	2024-02-04 13:15:51 +01:00
Ettore Di Giacinto	df13ba655c	Drop old falcon backend (deprecated) (#1675 ) Drop old falcon backend	2024-02-03 13:01:13 +01:00
coyzeng	d5d82ba344	feat(grpc): backend SPI pluggable in embedding mode (#1621 ) * run server * grpc backend embedded support * backend providable	2024-01-23 08:56:36 +01:00
Ettore Di Giacinto	e19d7226f8	feat: more embedded models, coqui fixes, add model usage and description (#1556 ) * feat: add model descriptions and usage * remove default model gallery * models: add embeddings and tts * docs: update table * docs: updates * images: cleanup pip cache after install * images: always run apt-get clean * ux: improve gRPC connection errors * ux: improve some messages * fix: fix coqui when no AudioPath is passed by * embedded: add more models * Add usage * Reorder table	2024-01-08 00:37:02 +01:00
Ettore Di Giacinto	db926896bd	Revert "[Refactor]: Core/API Split" (#1550 ) Revert "[Refactor]: Core/API Split (#1506)" This reverts commit `ab7b4d5ee9`.	2024-01-05 18:04:46 +01:00
Dave	ab7b4d5ee9	[Refactor]: Core/API Split (#1506 ) Refactors api folder to core, creates firm split between backend code and api frontend.	2024-01-05 15:34:56 +01:00
Ettore Di Giacinto	66fa4f1767	feat: share models by url (#1522 ) * feat: allow to pass by models via args * expose it also as an env/arg * docs: enhancements to build/requirements * do not display status always * print download status * not all mesages are debug	2024-01-01 10:31:03 +01:00
Gianluca Boiano	cae7b197ec	feat: add tiny dream stable diffusion support (#1283 ) Signed-off-by: Gianluca Boiano <morf3089@gmail.com>	2023-12-24 19:27:24 +00:00
Ettore Di Giacinto	1fc3a375df	feat: inline templates and accept URLs in models (#1452 ) * feat: Allow inline templates * feat: Allow to specify url in model config files Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * feat: support 'huggingface://' format * style: reuse-code from gallery --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2023-12-18 18:58:44 +01:00
Ettore Di Giacinto	3d83128f16	feat(alias): alias llama to llama-cpp, update docs (#1448 ) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2023-12-16 18:22:45 +01:00
Dave	8b6e601405	Feat: new backend: transformers-musicgen (#1387 ) Transformers-MusicGen --------- Signed-off-by: Dave <dave@gray101.com>	2023-12-08 10:01:02 +01:00
Ettore Di Giacinto	824612f1b4	feat: initial watchdog implementation (#1341 ) * feat: initial watchdog implementation Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * fiuxups * Add more output * wip: idletime checker * wire idle watchdog checks * enlarge watchdog time window * small fixes * Use stopmodel * Always delete process Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-11-26 18:36:23 +01:00
Ettore Di Giacinto	3c9544b023	refactor: rename llama-stable to llama-ggml (#1287 ) * refactor: rename llama-stable to llama-ggml * Makefile: get sources in sources/ Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixup path Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixup sources Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups sd Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * update SD * fixup * fixup: create piper libdir also when not built Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix make target on linux test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-11-18 08:18:43 +01:00
Ettore Di Giacinto	548959b50f	feat: queue up requests if not running parallel requests (#1296 ) Return a GRPC which handles a lock in case it is not meant to be parallel. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-11-16 22:20:16 +01:00
Ettore Di Giacinto	fdd95d1d86	feat: allow to run parallel requests (#1290 ) * feat: allow to run parallel requests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixup Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-11-16 08:20:05 +01:00
Ettore Di Giacinto	0eae727366	🔥 add LaVA support and GPT vision API, Multiple requests for llama.cpp, return JSON types (#1254 ) * wip * wip * Make it functional Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * wip * Small fixups * do not inject space on role encoding, encode img at beginning of messages Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add examples/config defaults * Add include dir of current source dir * cleanup * fixes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups * Revert "fixups" This reverts commit `f1a4731cca`. * fixes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-11-11 13:14:59 +01:00
Ettore Di Giacinto	c62504ac92	cleanup: drop bloomz and ggllm as now supported by llama.cpp (#1217 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-10-26 07:43:31 +02:00
Ettore Di Giacinto	128694213f	feat: llama.cpp gRPC C++ backend (#1170 ) * wip: llama.cpp c++ gRPC server Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make it work, attach it to the build process Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * update deps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: add protobuf dep Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * try fix protobuf on cmake * cmake: workarounds Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * add packages * cmake: use fixed version of grpc Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * cmake(grpc): install locally * install grpc Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * install required deps for grpc on debian bullseye Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * debug * debug * Fixups * no need to install cmake manually Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: fixup macOS * use brew whenever possible Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * macOS fixups * debug * fix container build Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * workaround * try mac https://stackoverflow.com/questions/23905661/on-mac-g-clang-fails-to-search-usr-local-include-and-usr-local-lib-by-def * Disable temp. arm64 docker image builds --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-10-16 21:46:29 +02:00
Dave	10b0e13882	feat: backend monitor shutdown endpoint, process based (#938 ) This PR adds a new endpoint to the backend monitor section `/backend/shutdown` which terminates the grpc process for the related model.	2023-08-23 18:38:37 +02:00
Ettore Di Giacinto	ab5b75eb01	feat: add llama-stable backend (#932 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-08-20 16:35:42 +02:00
Ettore Di Giacinto	afdc0ebfd7	feat: add --single-active-backend to allow only one backend active at the time (#925 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-08-19 01:49:33 +02:00
Dave	8cb1061c11	Usage Features (#863 )	2023-08-18 21:23:14 +02:00
Ettore Di Giacinto	0ec695f9e4	feat: make initializer accept gRPC delay times (#900 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-08-16 01:11:32 +02:00
Ettore Di Giacinto	8c781a6a44	feat: Add Diffusers (#874 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-08-09 08:38:51 +02:00
Ettore Di Giacinto	39805b09e5	fix: pass by env in managed services Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-08-08 00:58:38 +02:00
Ettore Di Giacinto	a843e64fc2	feat: add initial AutoGPTQ backend implementation	2023-08-07 22:53:28 +02:00
Dave	7fb8b4191f	feat: "simple" chat/edit/completion template system prompt from config (#856 )	2023-08-03 00:19:55 +02:00
Dave	ce8e9dc690	feature: model list :: filter query string parameter (#830 )	2023-07-31 19:14:32 +02:00
Ettore Di Giacinto	569c1d1163	feat: add rope settings and negative prompt, drop grammar backend (#797 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-07-25 19:05:27 +02:00
Aman Gupta Karmani	12fe0932c4	feat: cancel stream generation if client disappears (#792 )	2023-07-24 23:10:54 +02:00
Dave	c6bf67f446	feat(llama2): add template for chat messages (#782 ) Co-authored-by: Aman Karmani <aman@tmm1.net> Lays some of the groundwork for LLAMA2 compatibility as well as other future models with complex prompting schemes. Started small refactoring in pkg/model/loader.go regarding template loading. Currently still a part of ModelLoader, but should be easy to add template loading for situations other than overall prompt templates and the new chat-specific per-message templates Adds support for new chat-endpoint-specific, per-message templates as an alternative to the existing Role: XYZ sprintf method. Includes a temporary prompt template as an example, since I have a few questions before we merge in the model-gallery side changes (see ) Minor debug logging changes.	2023-07-22 11:31:39 -04:00
Ettore Di Giacinto	c71c729bc2	debug	2023-07-21 10:53:26 +02:00
Ettore Di Giacinto	94916749c5	feat: add external grpc and model autoloading	2023-07-20 22:10:12 +02:00
Ettore Di Giacinto	47cc95fc9f	feat: add all backends to autoload Now since gRPCs are not crashing the main thread we can just greedly attempt all the backends we have available. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-07-20 00:40:28 +02:00
Ettore Di Giacinto	3feb632eb4	refactor: rename "llama-master" and "llama" (#776 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-07-20 00:36:16 +02:00
Ettore Di Giacinto	6352448b72	feat: add llama-master backend (#752 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2023-07-17 23:58:15 +02:00

1 2

84 Commits