LocalAI/backend/cpp/llama
Ettore Di Giacinto e49ea0123b
feat(llama.cpp): add flash_attention and no_kv_offloading (#2310)
feat(llama.cpp): add flash_attn and no_kv_offload

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 19:07:51 +02:00
..
CMakeLists.txt deps(llama.cpp): update, support Gemma models (#1734) 2024-02-21 17:23:38 +01:00
grpc-server.cpp feat(llama.cpp): add flash_attention and no_kv_offloading (#2310) 2024-05-13 19:07:51 +02:00
json.hpp 🔥 add LaVA support and GPT vision API, Multiple requests for llama.cpp, return JSON types (#1254) 2023-11-11 13:14:59 +01:00
Makefile feat: migrate python backends from conda to uv (#2215) 2024-05-10 15:08:08 +02:00
prepare.sh feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants (#2232) 2024-05-04 17:56:12 +02:00
utils.hpp feat(sycl): Add support for Intel GPUs with sycl (#1647) (#1660) 2024-02-01 19:21:52 +01:00