Commit Graph

1683 Commits

Author SHA1 Message Date
Ettore Di Giacinto
fdb45153fe
feat(llama.cpp): Totally decentralized, private, distributed, p2p inference (#2343)
* feat(llama.cpp): Enable decentralized, distributed inference

As https://github.com/mudler/LocalAI/pull/2324 introduced distributed inferencing thanks to
@rgerganov implementation in https://github.com/ggerganov/llama.cpp/pull/6829 in upstream llama.cpp, now
it is possible to distribute the workload to remote llama.cpp gRPC server.

This changeset now uses mudler/edgevpn to establish a secure, distributed network between the nodes using a shared token.
The token is generated automatically when starting the server with the `--p2p` flag, and can be used by starting the workers
with `local-ai worker p2p-llama-cpp-rpc` by passing the token via environment variable (TOKEN) or with args (--token).

As per how mudler/edgevpn works, a network is established between the server and the workers with dht and mdns discovery protocols,
the llama.cpp rpc server is automatically started and exposed to the underlying p2p network so the API server can connect on.

When the HTTP server is started, it will discover the workers in the network and automatically create the port-forwards to the service locally.
Then llama.cpp is configured to use the services.

This feature is behind the "p2p" GO_FLAGS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* go mod tidy

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: add p2p tag

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* better message

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 19:17:59 +02:00
Ettore Di Giacinto
16474bfb40
build: add sha (#2356)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 18:02:19 +02:00
Ettore Di Giacinto
5a6d120a56
feat(functions): don't use yaml.MapSlice (#2354)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 08:31:06 +02:00
Ettore Di Giacinto
7a480bb16f
models(gallery): add LocalAI-Llama3-8b-Function-Call-v0.2-GGUF (#2355)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-20 00:59:17 +02:00
LocalAI [bot]
053531e434
⬆️ Update ggerganov/whisper.cpp (#2352)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-19 22:23:02 +00:00
LocalAI [bot]
b7ab4f25d9
⬆️ Update ggerganov/llama.cpp (#2351)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-19 22:22:03 +00:00
Ettore Di Giacinto
73566a2bb2
feat(functions): allow to use JSONRegexMatch unconditionally (#2349)
* feat(functions): allow to use JSONRegexMatch unconditionally

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(functions): make json_regex_match a list

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 18:24:49 +02:00
Ettore Di Giacinto
8ccd5ab040
feat(webui): statically embed js/css assets (#2348)
* feat(webui): statically embed js/css assets

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* update font assets

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 18:24:27 +02:00
Ettore Di Giacinto
5a3db730b9
Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-19 16:37:10 +02:00
Ettore Di Giacinto
8ad669339e
add openvoice backend (#2334)
Wip openvoice
2024-05-19 16:27:08 +02:00
Ettore Di Giacinto
a10a952085
models(gallery): update poppy porpoise mmproj (#2346)
models(gallery): update poppy porpose mmproj

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 13:26:02 +02:00
Ettore Di Giacinto
b37447cac5
models(gallery): add master-yi (#2345)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 13:25:29 +02:00
Ettore Di Giacinto
f2d182a2eb
models(gallery): add anita (#2344)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-19 13:25:16 +02:00
lenaxia
6b6c8cdd5f
feat(functions): Enable true regex replacement for the regexReplacement option (#2341)
* Adding regex capabilities to ParseFunctionCall replacement

Signed-off-by: Lenaxia <github@47north.lat>

* Adding tests for the regex replace in ParseFunctionCall

Signed-off-by: Lenaxia <github@47north.lat>

* Fixing tests and adding a test case to validate double quote replacement works

Signed-off-by: Lenaxia <github@47north.lat>

* Make Regex replacement stable, drop lookaheads

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Lenaxia <github@47north.lat>
Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: Lenaxia <github@47north.lat>
Co-authored-by: mudler <mudler@localai.io>
2024-05-19 01:29:10 +02:00
LocalAI [bot]
5f35e85e86
⬆️ Update ggerganov/llama.cpp (#2342)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-18 21:06:29 +00:00
Ettore Di Giacinto
02f1b477df
feat(functions): simplify parsing, read functions as list (#2340)
Signed-off-by: mudler <mudler@localai.io>
2024-05-18 09:35:28 +02:00
LocalAI [bot]
9ab8f8f5e0
⬆️ Update ggerganov/llama.cpp (#2339)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-17 21:13:01 +00:00
LocalAI [bot]
9a255d6453
⬆️ Update ggerganov/llama.cpp (#2337)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-16 21:53:19 +00:00
Ettore Di Giacinto
e0ef9e2bb9
models(gallery): add yi 6/9b, sqlcoder, sfr-iterative-dpo (#2335)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-16 20:05:20 +02:00
cryptk
86627b27f7
fix: add setuptools to all requirements-intel.txt files for python backends (#2333)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-16 19:15:46 +02:00
LocalAI [bot]
4e92569d45
⬆️ Update ggerganov/whisper.cpp (#2329)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-15 22:24:06 +00:00
Ettore Di Giacinto
f7508e3888
models(gallery): add hermes-2-theta-llama-3-8b (#2331)
Signed-off-by: mudler <mudler@localai.io>
2024-05-16 00:22:32 +02:00
Aleksandr Oleinikov
badfc16df1
fix(gallery) Correct llama3-8b-instruct model file (#2330)
Correct llama3-8b-instruct model file

This must be a mistake because the config tries to use a model file that is different from the one actually being downloaded.
I assumed the downloaded file is what should be used so I corrected the specified model file to that

Signed-off-by: Aleksandr Oleinikov <10602045+tannisroot@users.noreply.github.com>
2024-05-16 00:22:05 +02:00
LocalAI [bot]
b584dcf18a
⬆️ Update ggerganov/llama.cpp (#2316)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-15 22:20:37 +00:00
Ettore Di Giacinto
4c845fb47d
Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-15 23:56:52 +02:00
Ettore Di Giacinto
07c0559d06
Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-15 23:56:22 +02:00
Ettore Di Giacinto
beb598e4f9
feat(functions): mixed JSON BNF grammars (#2328)
feat(functions): support mixed JSON BNF grammar

This PR provides new options to control how functions are extracted from
the LLM, and also provides more control on how JSON grammars can be used
(also in conjunction).

New YAML settings introduced:

- `grammar_message`: when enabled, the generated grammar can also decide
  to push strings and not only JSON objects. This allows the LLM to pick
to either respond freely or using JSON.
- `grammar_prefix`: Allows to prefix a string to the JSON grammar
  definition.
- `replace_results`: Is a map that allows to replace strings in the LLM
  result.

As an example, consider the following settings for Hermes-2-Pro-Mistral,
which allow extracting both JSON results coming from the model, and the
ones coming from the grammar:

```yaml
function:
  # disable injecting the "answer" tool
  disable_no_action: true
  # This allows the grammar to also return messages
  grammar_message: true
  # Suffix to add to the grammar
  grammar_prefix: '<tool_call>\n'
  return_name_in_function_response: true
  # Without grammar uncomment the lines below
  # Warning: this is relying only on the capability of the
  # LLM model to generate the correct function call.
  # no_grammar: true
  # json_regex_match: "(?s)<tool_call>(.*?)</tool_call>"
  replace_results:
    "<tool_call>": ""
    "\'": "\""
```

Note: To disable entirely grammars usage in the example above, uncomment the
`no_grammar` and `json_regex_match`.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-15 20:03:18 +02:00
Ettore Di Giacinto
c89271b2e4
feat(llama.cpp): add distributed llama.cpp inferencing (#2324)
* feat(llama.cpp): support distributed llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: let tweak how chat messages are merged together

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Makefile: register to ALL_GRPC_BACKENDS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring, allow disable auto-detection of backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* minor fixups

Signed-off-by: mudler <mudler@localai.io>

* feat: add cmd to start rpc-server from llama.cpp

Signed-off-by: mudler <mudler@localai.io>

* ci: add ccache

Signed-off-by: mudler <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: mudler <mudler@localai.io>
2024-05-15 01:17:02 +02:00
Ettore Di Giacinto
29909666c3
Update README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2024-05-15 00:33:16 +02:00
LocalAI [bot]
566b5cf2ee
⬆️ Update ggerganov/whisper.cpp (#2326)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-14 21:17:46 +00:00
Sertaç Özercan
a670318a9f
feat: auto select llama-cpp cuda runtime (#2306)
* auto select cpu variant

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* remove cuda target for now

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix metal

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix path

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* cuda

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* auto select cuda

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* update test

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* select CUDA backend only if present

Signed-off-by: mudler <mudler@localai.io>

* ci: keep cuda bin in path

Signed-off-by: mudler <mudler@localai.io>

* Makefile: make dist now builds also cuda

Signed-off-by: mudler <mudler@localai.io>

* Keep pushing fallback in case auto-flagset/nvidia fails

There could be other reasons for which the default binary may fail. For example we might have detected an Nvidia GPU,
however the user might not have the drivers/cuda libraries installed in the system, and so it would fail to start.

We keep the fallback of llama.cpp at the end of the llama.cpp backends to try to fallback loading in case things go wrong

Signed-off-by: mudler <mudler@localai.io>

* Do not build cuda on MacOS

Signed-off-by: mudler <mudler@localai.io>

* cleanup

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: mudler <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: mudler <mudler@localai.io>
2024-05-14 19:40:18 +02:00
Ettore Di Giacinto
84e2407afa
feat(functions): allow to set JSON matcher (#2319)
Signed-off-by: mudler <mudler@localai.io>
2024-05-14 09:39:20 +02:00
Ettore Di Giacinto
c4186f13c3
feat(functions): support models with no grammar and no regex (#2315)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-14 00:32:32 +02:00
LocalAI [bot]
4ac7956f68
⬆️ Update ggerganov/whisper.cpp (#2317)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-13 22:25:14 +00:00
Ettore Di Giacinto
e49ea0123b
feat(llama.cpp): add flash_attention and no_kv_offloading (#2310)
feat(llama.cpp): add flash_attn and no_kv_offload

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 19:07:51 +02:00
Ettore Di Giacinto
7123d07456
models(gallery): add orthocopter (#2313)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 18:45:58 +02:00
Ettore Di Giacinto
2db22087ae
models(gallery): add lumimaidv2 (#2312)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 18:44:44 +02:00
Ettore Di Giacinto
fa7b2aee9c
models(gallery): add Bunny-llama (#2311)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 18:44:25 +02:00
Ettore Di Giacinto
4d70b6fb2d
models(gallery): add aura-llama-Abliterated (#2309)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-13 18:44:10 +02:00
Sertaç Özercan
e2c3ffb09b
feat: auto select llama-cpp cpu variant (#2305)
* auto select cpu variant

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* remove cuda target for now

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix metal

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

* fix path

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>

---------

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-13 11:37:52 +02:00
LocalAI [bot]
b4cb22f444
⬆️ Update ggerganov/llama.cpp (#2303)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-12 21:18:59 +00:00
LocalAI [bot]
5534b13903
feat(swagger): update swagger (#2302)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2024-05-12 21:00:18 +00:00
fakezeta
5b79bd04a7
add setuptools for openvino (#2301) 2024-05-12 19:31:43 +00:00
Ettore Di Giacinto
9d8c705fd9
feat(ui): display number of available models for installation (#2298)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-12 14:24:36 +02:00
Ettore Di Giacinto
310b2171be
models(gallery): add llama-3-refueled (#2297)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-12 09:39:58 +02:00
Ettore Di Giacinto
98af0b5d85
models(gallery): add jsl-medllama-3-8b-v2.0 (#2296)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-12 09:38:05 +02:00
Ettore Di Giacinto
ca14f95d2c
models(gallery): add l3-chaoticsoliloquy-v1.5-4x8b (#2295)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2024-05-12 09:37:55 +02:00
Ikko Eltociear Ashimine
1b69b338c0
docs: Update semantic-todo/README.md (#2294)
seperate -> separate

Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-05-12 09:02:11 +02:00
cryptk
88942e4761
fix: add missing openvino/optimum/etc libraries for Intel, fixes #2289 (#2292)
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
2024-05-12 09:01:45 +02:00
Ettore Di Giacinto
efa32a2677
feat(grammar): support models with specific construct (#2291)
When enabling grammar with functions, it might be useful to
allow more flexibility to support models that are fine-tuned against returning
function calls of the form of { "name": "function_name", "arguments" {...} }
rather then { "function": "function_name", "arguments": {..} }.

This might call out to a more generic approach later on, but for the moment being we can easily support both
as we have just to specific different types.

If needed we can expand on this later on

Signed-off-by: mudler <mudler@localai.io>
2024-05-12 01:13:22 +02:00