* fixes#1775 and #1774
Add BitsAndBytes Quantization and fixes embedding on CUDA devices
* Manage 4bit and 8 bit quantization
Manage different BitsAndBytes options with the quantization: parameter in yaml
* fix compilation errors on non CUDA environment
* feat(intel): add diffusers support
* try to consume upstream container image
* Debug
* Manually install deps
* Map transformers/hf cache dir to modelpath if not specified
* fix(compel): update initialization, pass by all gRPC options
* fix: add dependencies, implement transformers for xpu
* base it from the oneapi image
* Add pillow
* set threads if specified when launching the API
* Skip conda install if intel
* defaults to non-intel
* ci: add to pipelines
* prepare compel only if enabled
* Skip conda install if intel
* fix cleanup
* Disable compel by default
* Install torch 2.1.0 with Intel
* Skip conda on some setups
* Detect python
* Quiet output
* Do not override system python with conda
* Prefer python3
* Fixups
* exllama2: do not install without conda (overrides pytorch version)
* exllama/exllama2: do not install if not using cuda
* Add missing dataset dependency
* Small fixups, symlink to python, add requirements
* Add neural_speed to the deps
* correctly handle model offloading
* fix: device_map == xpu
* go back at calling python, fixed at dockerfile level
* Exllama2 restricted to only nvidia gpus
* Tokenizer to xpu
* feat(transformers): support also text generation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* embedded: set seed -1
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(conda): share env between diffusers and bark
* Detect if env already exists
* share diffusers and petals
* tests: add petals
* Use smaller model for tests with petals
* test only model load on petals
* tests(petals): run only load model tests
* Revert "test only model load on petals"
This reverts commit 111cfa97f1.
* move transformers and sentencetransformers to common env
* Share also transformers-musicgen
* Use cuda in transformers if available
tensorflow probably needs a different check.
Signed-off-by: Erich Schubert <kno10@users.noreply.github.com>
* feat: expose CUDA at top level
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests: add to tests and create workflow for py extra backends
* doc: update note on how to use core images
---------
Signed-off-by: Erich Schubert <kno10@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Erich Schubert <kno10@users.noreply.github.com>
* Update docs for new requirements.txt path
Signed-off-by: Marcus Köhler <khler.marcus@gmail.com>
* Fix typo (.PONY -> .PHONY) in python backend makefiles
Signed-off-by: Marcus Köhler <khler.marcus@gmail.com>
---------
Signed-off-by: Marcus Köhler <khler.marcus@gmail.com>
* Fix python header comments for some extra gRPC backends
When a Python script is to be executed directly via exec(3), either the platform knows how to execute
the file itself (i.e. special configuration is necessary) or the first line
contains a shebang (#!) specifying the interpreter to run it (similar to
shell scripts).
The shebang MUST be on the first line for the script to work on all platforms,
so any header comments need to be in the lines following it. Otherwise
executing these scripts as extra backends will yield an "exec format
error" message.
Changes:
* Move introductory comments below the shebang line
* Change header comment in transformers.py to refer to the correct
python module
Signed-off-by: Marcus Köhler <khler.marcus@gmail.com>
* Make header comment in ttsbark.py more specific
Signed-off-by: Marcus Köhler <khler.marcus@gmail.com>
---------
Signed-off-by: Marcus Köhler <khler.marcus@gmail.com>
* Update huggingface.py
Switch SentenceTransformer for AutoModel in order to set trust_remote_code needed to use the encode method with embeddings models like jinai-v2
Signed-off-by: Lucas Hänke de Cansino <lhc@next-boss.eu>
* feat(transformers): split in separate backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Lucas Hänke de Cansino <lhc@next-boss.eu>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Lucas Hänke de Cansino <lhc@next-boss.eu>