Ettore Di Giacinto
acd829a7a0
fix: do not break on newlines on function returns ( #864 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-04 21:46:36 +02:00
Ettore Di Giacinto
5ca21ee398
feat: add ngqa and RMSNormEps parameters ( #860 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-03 00:51:08 +02:00
Dave
7fb8b4191f
feat: "simple" chat/edit/completion template system prompt from config ( #856 )
2023-08-03 00:19:55 +02:00
Ettore Di Giacinto
c309aac8f5
fix(gallery): use inline YAML ( #851 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-08-01 19:09:32 +02:00
Ettore Di Giacinto
d603a9cbb5
fix(gallery): preload from file should by in YAML format ( #846 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-31 21:13:16 +02:00
Dave
ce8e9dc690
feature: model list :: filter query string parameter ( #830 )
2023-07-31 19:14:32 +02:00
Dave
8e8d474ae8
refactor: Remove remaining uses of depreciated package io/ioutil
( #837 )
2023-07-30 11:23:43 +00:00
Ettore Di Giacinto
e70b91aaef
tests: set a small context_size
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-29 10:29:47 +02:00
Ettore Di Giacinto
f085baa77d
fix: set default rope if not specified
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-29 01:07:16 +02:00
Ettore Di Giacinto
dde12b492b
fix: select function calls if 'name' is set in the request ( #827 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-28 01:17:11 +02:00
Ettore Di Giacinto
096d98c3d9
fix: add rope settings during model load, fix CUDA ( #821 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-27 21:56:05 +02:00
Ettore Di Giacinto
b96e30e66c
fix: use bytes in gRPC proto instead of strings ( #813 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-27 18:41:04 +02:00
Ettore Di Giacinto
569c1d1163
feat: add rope settings and negative prompt, drop grammar backend ( #797 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-25 19:05:27 +02:00
Aman Gupta Karmani
12fe0932c4
feat: cancel stream generation if client disappears ( #792 )
2023-07-24 23:10:54 +02:00
Dave
c6bf67f446
feat(llama2): add template for chat messages ( #782 )
...
Co-authored-by: Aman Karmani <aman@tmm1.net>
Lays some of the groundwork for LLAMA2 compatibility as well as other future models with complex prompting schemes.
Started small refactoring in pkg/model/loader.go regarding template loading. Currently still a part of ModelLoader, but should be easy to add template loading for situations other than overall prompt templates and the new chat-specific per-message templates
Adds support for new chat-endpoint-specific, per-message templates as an alternative to the existing Role: XYZ sprintf method.
Includes a temporary prompt template as an example, since I have a few questions before we merge in the model-gallery side changes (see )
Minor debug logging changes.
2023-07-22 11:31:39 -04:00
Ettore Di Giacinto
94817b557c
fix: make completions endpoint more close to OpenAI specification ( #790 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-22 00:53:52 +02:00
Ettore Di Giacinto
c71c729bc2
debug
2023-07-21 10:53:26 +02:00
Ettore Di Giacinto
e459f114cd
fix: fix tests, small refactors
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-20 23:52:04 +02:00
Ettore Di Giacinto
982a7e86a8
feat: add huggingface embeddings backend
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-20 22:10:42 +02:00
Ettore Di Giacinto
94916749c5
feat: add external grpc and model autoloading
2023-07-20 22:10:12 +02:00
Ettore Di Giacinto
1d2ae46ddc
tests: clean up logs
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-20 01:36:34 +02:00
Ettore Di Giacinto
3feb632eb4
refactor: rename "llama-master" and "llama" ( #776 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-20 00:36:16 +02:00
Ettore Di Giacinto
6352448b72
feat: add llama-master backend ( #752 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-17 23:58:15 +02:00
Ettore Di Giacinto
d0e67cce75
fix: make last stream message to send empty content
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-16 00:09:28 +02:00
Ettore Di Giacinto
17294ae5e5
fix: make first stream message to send empty content ( #751 )
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 22:50:52 +02:00
Ettore Di Giacinto
1d0ed95a54
feat: move other backends to grpc
...
This finally makes everything more consistent
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
5dcfdbe51d
feat: various refactorings
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
f2f1d7fe72
feat: use gRPC for transformers
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
ae533cadef
feat: move gpt4all to a grpc service
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
58f6aab637
feat: move llama to a grpc
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
Ettore Di Giacinto
b816009db0
feat: add falcon ggllm via grpc client
...
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2023-07-15 01:19:43 +02:00
mudler
dcf35dd25f
Fixup custom role encoding
...
Signed-off-by: mudler <mudler@localai.io>
2023-07-09 11:13:19 +02:00
mudler
e70322676c
Allow to customize no action behavior
...
Signed-off-by: mudler <mudler@localai.io>
2023-07-09 10:53:46 +02:00
mudler
b3f43ab938
Add a way to disable default action
2023-07-09 10:02:21 +02:00
mudler
bbc4468908
Make functions more compatible with OpenAI specs
2023-07-09 10:02:09 +02:00
mudler
55befe396a
Add grammar_json to the request parameters to facilitate JSON generation
2023-07-06 19:08:04 +02:00
mudler
483fddccf9
minor fixups
2023-07-06 11:55:19 +02:00
mudler
05aed255db
Customize function call in templates
2023-07-05 18:24:44 +02:00
mudler
0f1326b2bd
fixups
2023-07-04 23:40:22 +02:00
mudler
b722e7eb7e
feat: cleanups, small enhancements
...
Signed-off-by: mudler <mudler@localai.io>
2023-07-04 18:58:19 +02:00
mudler
f09ddd2983
feat: add grammar and functions call support
2023-07-04 18:58:19 +02:00
Luis López
a6839fd238
feat: [whisper] Partial support for verbose_json format in transcribe endpoint ( #721 )
2023-07-04 14:31:31 +02:00
Ettore Di Giacinto
3593cb0c87
feat: update llama, enable NUMA ( #684 )
2023-06-27 09:00:10 +02:00
Ettore Di Giacinto
02136531a3
fix: return index and delta in stream token ( #680 )
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-26 18:49:36 +02:00
Ettore Di Giacinto
d3a486a4f8
feat: Add '/version' endpoint and display it in the CLI ( #679 )
2023-06-26 15:12:43 +02:00
Ettore Di Giacinto
2b957df56c
fix: rename /models/list to /models/available ( #678 )
2023-06-26 15:12:26 +02:00
Ettore Di Giacinto
78f3c3da48
refactor: consolidate usage of GetURI ( #674 )
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-26 12:25:38 +02:00
Ettore Di Giacinto
60db5957d3
Gallery repository ( #663 )
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-24 08:18:17 +02:00
Ettore Di Giacinto
a7bb029d23
feat: add tts with go-piper ( #649 )
...
Signed-off-by: mudler <mudler@localai.io>
2023-06-22 17:53:10 +02:00
Ettore Di Giacinto
2f5feb4841
Add LowVRAM option parameter ( #642 )
2023-06-20 20:33:47 +02:00
Ettore Di Giacinto
295f3030a9
feat: add typical_p to model parameters ( #598 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-14 19:33:20 +02:00
Ettore Di Giacinto
10ddd72b58
fix: set default batch size ( #597 )
2023-06-14 19:09:27 +02:00
Ettore Di Giacinto
e37361985c
deps: update gpt4all bindings, fix search path on new versions ( #592 )
2023-06-14 13:24:53 +02:00
Ettore Di Giacinto
84946e9275
feat: display download progress when installing models ( #543 )
2023-06-08 21:33:18 +02:00
Ettore Di Giacinto
c9bbba4872
tests: add llama tests with openllama ( #538 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-08 00:36:11 +02:00
Ettore Di Giacinto
5abbb134d9
feat: extend model configuration for llama.cpp ( #536 )
2023-06-07 21:46:19 +02:00
Ettore Di Giacinto
d62aef2016
feat: add experimental support for falcon-7b ( #516 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-06 17:23:19 +02:00
Ettore Di Giacinto
b503725dc7
fix: downgrade gpt4all ( #503 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-05 09:42:50 +02:00
Samuel Maynard
96794851b3
feat: add support for Stream: true
to completionEndpoint ( #465 )
2023-06-03 00:27:03 +02:00
Ettore Di Giacinto
78ad4813df
feat: Update gpt4all, support multiple implementations in runtime ( #472 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-06-01 23:38:52 +02:00
Aisuko
c8a4a4f4e9
feat: Add new test cases for LoadConfigs ( #447 )
...
Signed-off-by: Aisuko <urakiny@gmail.com>
2023-06-01 16:20:45 +02:00
Pavel Zloi
3ba07a5928
feat: add LangChainGo Huggingface backend ( #446 )
...
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-06-01 12:00:06 +02:00
Aisuko
49ce24984c
feat: Add more test-cases and remove dev container ( #433 )
...
Signed-off-by: Aisuko <urakiny@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2023-05-30 13:01:55 +02:00
Ettore Di Giacinto
f401181cb5
fix: switch back to upstream for rwkv bindings ( #432 )
2023-05-30 12:35:32 +02:00
Ettore Di Giacinto
aacb96df7a
fix: correctly handle errors from App constructor ( #430 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-05-30 12:00:30 +02:00
Ettore Di Giacinto
217dbb448e
feat: allow to set a prompt cache path and enable saving state ( #395 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-05-27 14:29:11 +02:00
Ettore Di Giacinto
76c881043e
feat: allow to preload models before startup via env var or configs ( #391 )
2023-05-27 09:26:33 +02:00
Ettore Di Giacinto
bf54b78270
feat: add /healthz and /readyz endpoints for kubernetes ( #374 )
2023-05-24 22:19:13 +02:00
Ettore Di Giacinto
9decd0813c
feat: update go-gpt2 ( #359 )
...
Signed-off-by: mudler <mudler@mocaccino.org>
2023-05-23 21:47:47 +02:00
Robert Hambrock
4aa78843c0
fix: spec compliant instantiation and termination of streams ( #341 )
2023-05-21 15:24:04 +02:00
Ettore Di Giacinto
6f54cab3f0
feat: allow to set cors ( #339 )
2023-05-21 14:38:25 +02:00
Ettore Di Giacinto
05a3d569b0
feat: allow to override model config ( #323 )
2023-05-20 17:03:53 +02:00
Ettore Di Giacinto
4e381cbe92
feat: support shorter urls for github repositories ( #314 )
2023-05-20 09:06:30 +02:00
Ettore Di Giacinto
1fade53a61
feat: minor enhancements to /models/apply ( #297 )
2023-05-19 08:31:11 +02:00
Ettore Di Giacinto
cc9aa9eb3f
feat: add /models/apply endpoint to prepare models ( #286 )
2023-05-18 15:59:03 +02:00
Ettore Di Giacinto
3f739575d8
Minor fixes ( #285 )
2023-05-17 21:01:46 +02:00
Ettore Di Giacinto
9d051c5d4f
feat: add image generation with ncnn-stablediffusion ( #272 )
2023-05-16 19:32:53 +02:00
Ettore Di Giacinto
acd03d15f2
feat: add support for cublas/openblas in the llama.cpp backend ( #258 )
2023-05-16 16:26:25 +02:00
Ettore Di Giacinto
a035de2fdd
tests: add rwkv ( #261 )
2023-05-15 08:15:01 +02:00
Ettore Di Giacinto
2488c445b6
feat: bert.cpp token embeddings ( #241 )
2023-05-12 17:16:49 +02:00
Ettore Di Giacinto
b4241d0a0d
tests: enable whisper ( #239 )
2023-05-12 14:10:18 +02:00
Ettore Di Giacinto
8250391e49
Add support for gptneox/replit ( #238 )
2023-05-12 11:36:35 +02:00
Ettore Di Giacinto
fd1df4e971
whisper: add tests and allow to set upload size ( #237 )
2023-05-12 10:04:20 +02:00
Ettore Di Giacinto
4413defca5
feat: add starcoder ( #236 )
2023-05-11 20:20:07 +02:00
Ettore Di Giacinto
85f0f8227d
refactor: drop code dups ( #234 )
2023-05-11 16:34:16 +02:00
Ettore Di Giacinto
59e3c02002
make use of new bindings for gpt4all ( #232 )
2023-05-11 14:31:19 +02:00
Matthew Campbell
032dee256f
Keep whisper models in memory ( #233 )
2023-05-11 14:05:07 +02:00
Matthew Campbell
6b5e2b2bf5
Upload transcription API wasn't reading the data from the post ( #229 )
2023-05-11 10:43:05 +02:00
Ettore Di Giacinto
11675932ac
feat: add dolly/redpajama/bloomz models support ( #214 )
2023-05-11 01:12:58 +02:00
Ettore Di Giacinto
f8ee20991c
feat: add bert.cpp embeddings ( #222 )
2023-05-10 15:20:21 +02:00
Ettore Di Giacinto
9f426578cf
feat: add transcript endpoint ( #211 )
2023-05-09 11:43:50 +02:00
Ettore Di Giacinto
89dfa0f5fc
feat: add experimental support for embeddings as arrays ( #207 )
2023-05-08 19:31:18 +02:00
Dave
07ec2e441d
mini fix - OpenAI documentation url ( #200 )
2023-05-06 00:42:08 +02:00
mudler
8c8cf38d4d
tests: use 1 core
2023-05-05 23:29:34 +02:00
mudler
009ee47fe2
Don't allow 0 as thread count
2023-05-05 22:51:20 +02:00
mudler
ec2adc2c03
tests: use 3 cores
2023-05-05 22:07:01 +02:00
mudler
e62ee2bc06
fix: remove trailing 0s from embeddings
...
This happens when no max_tokens are set, so by default go-llama
allocates more space for the slice and padding happens.
2023-05-05 18:35:03 +02:00
mudler
b49721cdd1
fix: respect config from file for backends settings
2023-05-05 18:05:10 +02:00
mudler
64c0a7967f
fix: pass prediction options when using the model
2023-05-05 15:56:02 +02:00
mudler
e96eadab40
feat: support deprecated embeddings API
2023-05-05 15:55:19 +02:00