Ludovic Leroux
|
939411300a
|
Bump vLLM version + more options when loading models in vLLM (#1782)
* Bump vLLM version to 0.3.2
* Add vLLM model loading options
* Remove transformers-exllama
* Fix install exllama
|
2024-03-01 22:48:53 +01:00 |
|
Ettore Di Giacinto
|
5e155fb081
|
fix(python): pin exllama2 (#1711)
fix(python): pin python deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
|
2024-02-14 21:44:12 +01:00 |
|
Ettore Di Giacinto
|
cb7512734d
|
transformers: correctly load automodels (#1643)
* backends(transformers): use AutoModel with LLM types
* examples: animagine-xl
* Add codellama examples
|
2024-01-26 00:13:21 +01:00 |
|
Ettore Di Giacinto
|
06cd9ef98d
|
feat(extra-backends): Improvements, adding mamba example (#1618)
* feat(extra-backends): Improvements
vllm: add max_tokens, wire up stream event
mamba: fixups, adding examples for mamba-chat
* examples(mamba-chat): add
* docs: update
|
2024-01-20 17:56:08 +01:00 |
|
Ettore Di Giacinto
|
9e653d6abe
|
feat: 🐍 add mamba support (#1589)
feat(mamba): Initial import
This is a first iteration of the mamba backend, loosely based on
mamba-chat(https://github.com/havenhq/mamba-chat).
|
2024-01-19 23:42:50 +01:00 |
|