Merge 49df11b4e8 into 6559ac11b1

feat(ui): prompt for chat, support vision, enhancements (#2259 )
* feat(ui): allow to set system prompt for chat Make also the models in the index clickable, and display as table Fixes #2257 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(vision): support also png with base64 input Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): support vision and upload of files Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * display the processed image Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make trust remote code stand out Signed-off-by: mudler <mudler@localai.io> * feat(ui): track in progress job across index/model gallery Signed-off-by: mudler <mudler@localai.io> * minor fixups Signed-off-by: mudler <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: mudler <mudler@localai.io>
2024-05-08 12:34:22 +02:00 · 2024-05-08 00:42:34 +02:00 · 2024-05-08 00:14:19 +02:00 · 2024-05-07 21:39:12 +00:00 · 2024-05-07 16:34:30 +00:00 · 2024-05-07 08:39:58 +02:00
60 changed files with 2422 additions and 906 deletions
--- a/.github/workflows/image-pr.yml
+++ b/.github/workflows/image-pr.yml
@ -61,7 +61,7 @@ jobs:
            tag-suffix: '-hipblas'
            ffmpeg: 'false'
            image-type: 'extras'
-            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            base-image: "rocm/dev-ubuntu-22.04:6.1"
            grpc-base-image: "ubuntu:22.04"
            runs-on: 'arc-runner-set'
            makeflags: "--jobs=3 --output-sync=target"
--- a/.github/workflows/image.yml
+++ b/.github/workflows/image.yml
@ -129,7 +129,7 @@ jobs:
            ffmpeg: 'true'
            image-type: 'extras'
            aio: "-aio-gpu-hipblas"
-            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            base-image: "rocm/dev-ubuntu-22.04:6.1"
            grpc-base-image: "ubuntu:22.04"
            latest-image: 'latest-gpu-hipblas'
            latest-image-aio: 'latest-aio-gpu-hipblas'
@ -141,7 +141,7 @@ jobs:
            tag-suffix: '-hipblas'
            ffmpeg: 'false'
            image-type: 'extras'
-            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            base-image: "rocm/dev-ubuntu-22.04:6.1"
            grpc-base-image: "ubuntu:22.04"
            runs-on: 'arc-runner-set'
            makeflags: "--jobs=3 --output-sync=target"
@ -218,7 +218,7 @@ jobs:
            tag-suffix: '-hipblas-ffmpeg-core'
            ffmpeg: 'true'
            image-type: 'core'
-            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            base-image: "rocm/dev-ubuntu-22.04:6.1"
            grpc-base-image: "ubuntu:22.04"
            runs-on: 'arc-runner-set'
            makeflags: "--jobs=3 --output-sync=target"
@ -228,7 +228,7 @@ jobs:
            tag-suffix: '-hipblas-core'
            ffmpeg: 'false'
            image-type: 'core'
-            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            base-image: "rocm/dev-ubuntu-22.04:6.1"
            grpc-base-image: "ubuntu:22.04"
            runs-on: 'arc-runner-set'
            makeflags: "--jobs=3 --output-sync=target"
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@ -19,12 +19,8 @@ jobs:
    strategy:
      matrix:
        include:
-          - build: 'avx2'
+          - build: ''
            defines: ''
-          - build: 'avx'
-            defines: '-DLLAMA_AVX2=OFF'
-          - build: 'avx512'
-            defines: '-DLLAMA_AVX512=ON'
          - build: 'cuda12'
            defines: ''
          - build: 'cuda11'
@ -74,7 +70,6 @@ jobs:
      - name: Build
        id: build
        env:
-          CMAKE_ARGS: "${{ matrix.defines }}"
          BUILD_ID: "${{ matrix.build }}"
        run: |
          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
@ -124,63 +119,7 @@ jobs:
          name: stablediffusion
          path: release/

-  build-macOS:
-    strategy:
-      matrix:
-        include:
-          - build: 'avx2'
-            defines: ''
-          - build: 'avx'
-            defines: '-DLLAMA_AVX2=OFF'
-          - build: 'avx512'
-            defines: '-DLLAMA_AVX512=ON'
-    runs-on: macOS-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v4
-        with:
-          submodules: true
-      - uses: actions/setup-go@v5
-        with:
-          go-version: '1.21.x'
-          cache: false
-      - name: Dependencies
-        run: |
-          brew install protobuf grpc
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
-      - name: Build
-        id: build
-        env:
-          CMAKE_ARGS: "${{ matrix.defines }}"
-          BUILD_ID: "${{ matrix.build }}"
-        run: |
-          export C_INCLUDE_PATH=/usr/local/include
-          export CPLUS_INCLUDE_PATH=/usr/local/include
-          export PATH=$PATH:$GOPATH/bin
-          make dist
-      - uses: actions/upload-artifact@v4
-        with:
-          name: LocalAI-MacOS-${{ matrix.build }}
-          path: release/
-      - name: Release
-        uses: softprops/action-gh-release@v2
-        if: startsWith(github.ref, 'refs/tags/')
-        with:
-          files: |
-            release/*
-
-
  build-macOS-arm64:
-    strategy:
-      matrix:
-        include:
-          - build: 'avx2'
-            defines: ''
-          - build: 'avx'
-            defines: '-DLLAMA_AVX2=OFF'
-          - build: 'avx512'
-            defines: '-DLLAMA_AVX512=ON'
    runs-on: macos-14
    steps:
      - name: Clone
@ -198,9 +137,6 @@ jobs:
          go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
      - name: Build
        id: build
-        env:
-          CMAKE_ARGS: "${{ matrix.defines }}"
-          BUILD_ID: "${{ matrix.build }}"
        run: |
          export C_INCLUDE_PATH=/usr/local/include
          export CPLUS_INCLUDE_PATH=/usr/local/include
@ -208,7 +144,7 @@ jobs:
          make dist
      - uses: actions/upload-artifact@v4
        with:
-          name: LocalAI-MacOS-arm64-${{ matrix.build }}
+          name: LocalAI-MacOS-arm64
          path: release/
      - name: Release
        uses: softprops/action-gh-release@v2
--- a/12
+++ b/12
@ -140,6 +140,18 @@ RUN if [ "${BUILD_TYPE}" = "clblas" ]; then \
        rm -rf /var/lib/apt/lists/* \
    ; fi

+RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            hipblas-dev \
+            rocblas-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
+        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
+        ldconfig \
+    ; fi
+
 ###################################
 ###################################

--- a/50
+++ b/50
@ -5,7 +5,7 @@ BINARY_NAME=local-ai

 # llama.cpp versions
 GOLLAMA_STABLE_VERSION?=2b57a8ae43e4699d3dc5d1496a1ccd42922993be
-CPPLLAMA_VERSION?=f364eb6fb5d46118a76fa045f487318de4c24961
+CPPLLAMA_VERSION?=b6aa6702030320a3d5fbc2508307af0d7c947e40

 # gpt4all version
 GPT4ALL_REPO?=https://github.com/nomic-ai/gpt4all
@ -16,7 +16,7 @@ RWKV_REPO?=https://github.com/donomii/go-rwkv.cpp
 RWKV_VERSION?=661e7ae26d442f5cfebd2a0881b44e8c55949ec6

 # whisper.cpp version
-WHISPER_CPP_VERSION?=8fac6455ffeb0a0950a84e790ddb74f7290d33c4
+WHISPER_CPP_VERSION?=58210d6a7634ea1e42e0a2dab611f4a0518731dc

 # bert.cpp version
 BERT_VERSION?=6abe312cded14042f6b7c3cd8edf082713334a4d
@ -152,9 +152,11 @@ ifeq ($(findstring tts,$(GO_TAGS)),tts)
 	OPTIONAL_GRPC+=backend-assets/grpc/piper
 endif

-ALL_GRPC_BACKENDS=backend-assets/grpc/langchain-huggingface
+ALL_GRPC_BACKENDS=backend-assets/grpc/huggingface
 ALL_GRPC_BACKENDS+=backend-assets/grpc/bert-embeddings
 ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp
+ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-noavx
+ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-cpp-fallback
 ALL_GRPC_BACKENDS+=backend-assets/grpc/llama-ggml
 ALL_GRPC_BACKENDS+=backend-assets/grpc/gpt4all
 ALL_GRPC_BACKENDS+=backend-assets/grpc/rwkv
@ -293,6 +295,7 @@ clean: ## Remove build related file
 	rm -rf backend-assets/*
 	$(MAKE) -C backend/cpp/grpc clean
 	$(MAKE) -C backend/cpp/llama clean
+	rm -rf backend/cpp/llama-* || true
 	$(MAKE) dropreplace
 	$(MAKE) protogen-clean
 	rmdir pkg/grpc/proto || true
@ -311,14 +314,19 @@ build: prepare backend-assets grpcs ## Build the project
 	CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o $(BINARY_NAME) ./

 build-minimal:
-	BUILD_GRPC_FOR_BACKEND_LLAMA=true GRPC_BACKENDS=backend-assets/grpc/llama-cpp GO_TAGS=none $(MAKE) build
+	BUILD_GRPC_FOR_BACKEND_LLAMA=true GRPC_BACKENDS="backend-assets/grpc/llama-cpp" GO_TAGS=none $(MAKE) build

 build-api:
 	BUILD_GRPC_FOR_BACKEND_LLAMA=true BUILD_API_ONLY=true GO_TAGS=none $(MAKE) build

 dist: build
 	mkdir -p release
+# if BUILD_ID is empty, then we don't append it to the binary name
+ifeq ($(BUILD_ID),)
+	cp $(BINARY_NAME) release/$(BINARY_NAME)-$(OS)-$(ARCH)
+else
 	cp $(BINARY_NAME) release/$(BINARY_NAME)-$(BUILD_ID)-$(OS)-$(ARCH)
+endif

 osx-signed: build
 	codesign --deep --force --sign "$(OSX_SIGNING_IDENTITY)" --entitlements "./Entitlements.plist" "./$(BINARY_NAME)"
@ -616,8 +624,8 @@ backend-assets/grpc/gpt4all: sources/gpt4all sources/gpt4all/gpt4all-bindings/go
 	CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/sources/gpt4all/gpt4all-bindings/golang/ LIBRARY_PATH=$(CURDIR)/sources/gpt4all/gpt4all-bindings/golang/ \
 	$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/gpt4all ./backend/go/llm/gpt4all/

-backend-assets/grpc/langchain-huggingface: backend-assets/grpc
-	$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/langchain-huggingface ./backend/go/llm/langchain/
+backend-assets/grpc/huggingface: backend-assets/grpc
+	$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/huggingface ./backend/go/llm/langchain/

 backend/cpp/llama/llama.cpp:
 	LLAMA_VERSION=$(CPPLLAMA_VERSION) $(MAKE) -C backend/cpp/llama llama.cpp
@ -629,7 +637,7 @@ ADDED_CMAKE_ARGS=-Dabsl_DIR=${INSTALLED_LIB_CMAKE}/absl \
 				 -Dutf8_range_DIR=${INSTALLED_LIB_CMAKE}/utf8_range \
 				 -DgRPC_DIR=${INSTALLED_LIB_CMAKE}/grpc \
 				 -DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES=${INSTALLED_PACKAGES}/include
-backend/cpp/llama/grpc-server:
+build-llama-cpp-grpc-server:
 # Conditionally build grpc for the llama backend to use if needed
 ifdef BUILD_GRPC_FOR_BACKEND_LLAMA
 	$(MAKE) -C backend/cpp/grpc build
@ -638,19 +646,37 @@ ifdef BUILD_GRPC_FOR_BACKEND_LLAMA
 	PATH="${INSTALLED_PACKAGES}/bin:${PATH}" \
 	CMAKE_ARGS="${CMAKE_ARGS} ${ADDED_CMAKE_ARGS}" \
 	LLAMA_VERSION=$(CPPLLAMA_VERSION) \
-	$(MAKE) -C backend/cpp/llama grpc-server
+	$(MAKE) -C backend/cpp/${VARIANT} grpc-server
 else
 	echo "BUILD_GRPC_FOR_BACKEND_LLAMA is not defined."
-	LLAMA_VERSION=$(CPPLLAMA_VERSION) $(MAKE) -C backend/cpp/llama grpc-server
+	LLAMA_VERSION=$(CPPLLAMA_VERSION) $(MAKE) -C backend/cpp/${VARIANT} grpc-server
 endif

-backend-assets/grpc/llama-cpp: backend-assets/grpc backend/cpp/llama/grpc-server
-	cp -rfv backend/cpp/llama/grpc-server backend-assets/grpc/llama-cpp
+backend-assets/grpc/llama-cpp: backend-assets/grpc
+	$(info ${GREEN}I llama-cpp build info:standard${RESET})
+	cp -rf backend/cpp/llama backend/cpp/llama-default
+	$(MAKE) -C backend/cpp/llama-default purge
+	$(MAKE) VARIANT="llama-default" build-llama-cpp-grpc-server
+	cp -rfv backend/cpp/llama-default/grpc-server backend-assets/grpc/llama-cpp
 # TODO: every binary should have its own folder instead, so can have different metal implementations
 ifeq ($(BUILD_TYPE),metal)
-	cp backend/cpp/llama/llama.cpp/build/bin/default.metallib backend-assets/grpc/
+	cp backend/cpp/llama-default/llama.cpp/build/bin/default.metallib backend-assets/grpc/
 endif

+backend-assets/grpc/llama-cpp-noavx: backend-assets/grpc
+	cp -rf backend/cpp/llama backend/cpp/llama-noavx
+	$(MAKE) -C backend/cpp/llama-noavx purge
+	$(info ${GREEN}I llama-cpp build info:noavx${RESET})
+	CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF" $(MAKE) VARIANT="llama-noavx" build-llama-cpp-grpc-server
+	cp -rfv backend/cpp/llama-noavx/grpc-server backend-assets/grpc/llama-cpp-noavx
+
+backend-assets/grpc/llama-cpp-fallback: backend-assets/grpc
+	cp -rf backend/cpp/llama backend/cpp/llama-fallback
+	$(MAKE) -C backend/cpp/llama-fallback purge
+	$(info ${GREEN}I llama-cpp build info:fallback${RESET})
+	CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" $(MAKE) VARIANT="llama-fallback" build-llama-cpp-grpc-server
+	cp -rfv backend/cpp/llama-fallback/grpc-server backend-assets/grpc/llama-cpp-fallback
+
 backend-assets/grpc/llama-ggml: sources/go-llama.cpp sources/go-llama.cpp/libbinding.a backend-assets/grpc
 	CGO_LDFLAGS="$(CGO_LDFLAGS)" C_INCLUDE_PATH=$(CURDIR)/sources/go-llama.cpp LIBRARY_PATH=$(CURDIR)/sources/go-llama.cpp \
 	$(GOCMD) build -ldflags "$(LD_FLAGS)" -tags "$(GO_TAGS)" -o backend-assets/grpc/llama-ggml ./backend/go/llm/llama-ggml/
--- a/README.md
+++ b/README.md
@ -50,6 +50,7 @@

 [Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)

+- Chat, TTS, and Image generation in the WebUI: https://github.com/mudler/LocalAI/pull/2222
 - Reranker API: https://github.com/mudler/LocalAI/pull/2121
 - Gallery WebUI: https://github.com/mudler/LocalAI/pull/2104
 - llama3: https://github.com/mudler/LocalAI/discussions/2076
@ -113,6 +114,7 @@ Model galleries
 Other:
 - Helm chart https://github.com/go-skynet/helm-charts
 - VSCode extension https://github.com/badgooooor/localai-vscode-plugin
+- Terminal utility https://github.com/djcopley/ShellOracle
 - Local Smart assistant https://github.com/mudler/LocalAGI
 - Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation
 - Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
@ -131,7 +133,7 @@ Other:

 ## :book: 🎥 [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)

- [Run LocalAI on AWS EKS with Pulumi](https://www.pulumi.com/ai/answers/tiZMDoZzZV6TLxgDXNBnFE/deploying-helm-charts-on-aws-eks)
+- [Run LocalAI on AWS EKS with Pulumi](https://www.pulumi.com/blog/low-code-llm-apps-with-local-ai-flowise-and-pulumi/)
 - [Run LocalAI on AWS](https://staleks.hashnode.dev/installing-localai-on-aws-ec2-instance)
 - [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
 - [LocalAI meets k8sgpt](https://www.youtube.com/watch?v=PKrDNuJ_dfE)
--- a/aio/cpu/text-to-text.yaml
+++ b/aio/cpu/text-to-text.yaml
@ -1,7 +1,7 @@
 name: gpt-4
 mmap: true
 parameters:
-  model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q2_K.gguf
+  model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf

 template:
  chat_message: |
--- a/aio/gpu-8g/text-to-text.yaml
+++ b/aio/gpu-8g/text-to-text.yaml
@ -1,7 +1,7 @@
 name: gpt-4
 mmap: true
 parameters:
-  model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q6_K.gguf
+  model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf

 template:
  chat_message: |
--- a/aio/intel/text-to-text.yaml
+++ b/aio/intel/text-to-text.yaml
@ -2,7 +2,7 @@ name: gpt-4
 mmap: false
 f16: false
 parameters:
-  model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q6_K.gguf
+  model: huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf

 template:
  chat_message: |
--- a/backend/cpp/llama/Makefile
+++ b/backend/cpp/llama/Makefile
@ -43,31 +43,23 @@ llama.cpp:

 llama.cpp/examples/grpc-server: llama.cpp
 	mkdir -p llama.cpp/examples/grpc-server
-	cp -r $(abspath ./)/CMakeLists.txt llama.cpp/examples/grpc-server/
-	cp -r $(abspath ./)/grpc-server.cpp llama.cpp/examples/grpc-server/
-	cp -rfv $(abspath ./)/json.hpp llama.cpp/examples/grpc-server/
-	cp -rfv $(abspath ./)/utils.hpp llama.cpp/examples/grpc-server/
-	echo "add_subdirectory(grpc-server)" >> llama.cpp/examples/CMakeLists.txt
-## XXX: In some versions of CMake clip wasn't being built before llama.
-## This is an hack for now, but it should be fixed in the future.
-	cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
-	cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
-	echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
-	cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h
-	cp -rfv llama.cpp/examples/llava/clip.cpp llama.cpp/examples/grpc-server/clip.cpp
+	bash prepare.sh

 rebuild:
-	cp -rfv $(abspath ./)/CMakeLists.txt llama.cpp/examples/grpc-server/
-	cp -rfv $(abspath ./)/grpc-server.cpp llama.cpp/examples/grpc-server/
-	cp -rfv $(abspath ./)/json.hpp llama.cpp/examples/grpc-server/
+	bash prepare.sh
 	rm -rf grpc-server
 	$(MAKE) grpc-server

-clean:
-	rm -rf llama.cpp
+purge:
+	rm -rf llama.cpp/build
+	rm -rf llama.cpp/examples/grpc-server
 	rm -rf grpc-server

+clean: purge
+	rm -rf llama.cpp
+
 grpc-server: llama.cpp llama.cpp/examples/grpc-server
+	@echo "Building grpc-server with $(BUILD_TYPE) build type and $(CMAKE_ARGS)"
 ifneq (,$(findstring sycl,$(BUILD_TYPE)))
 	bash -c "source $(ONEAPI_VARS); \
 	cd llama.cpp && mkdir -p build && cd build && cmake .. $(CMAKE_ARGS) && cmake --build . --config Release"	
--- a/backend/cpp/llama/prepare.sh
+++ b/backend/cpp/llama/prepare.sh
@ -0,0 +1,20 @@
+#!/bin/bash
+
+cp -r CMakeLists.txt llama.cpp/examples/grpc-server/
+cp -r grpc-server.cpp llama.cpp/examples/grpc-server/
+cp -rfv json.hpp llama.cpp/examples/grpc-server/
+cp -rfv utils.hpp llama.cpp/examples/grpc-server/
+    
+if grep -q "grpc-server" llama.cpp/examples/CMakeLists.txt; then
+    echo "grpc-server already added"
+else
+    echo "add_subdirectory(grpc-server)" >> llama.cpp/examples/CMakeLists.txt
+fi
+
+## XXX: In some versions of CMake clip wasn't being built before llama.
+## This is an hack for now, but it should be fixed in the future.
+cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
+cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
+echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
+cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h
+cp -rfv llama.cpp/examples/llava/clip.cpp llama.cpp/examples/grpc-server/clip.cpp
--- a/backend/go/llm/langchain/langchain.go
+++ b/backend/go/llm/langchain/langchain.go
@ -4,6 +4,7 @@ package main
 // It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
 import (
 	"fmt"
+	"os"

 	"github.com/go-skynet/LocalAI/pkg/grpc/base"
 	pb "github.com/go-skynet/LocalAI/pkg/grpc/proto"
@ -18,9 +19,14 @@ type LLM struct {
 }

 func (llm *LLM) Load(opts *pb.ModelOptions) error {
-	llm.langchain, _ = langchain.NewHuggingFace(opts.Model)
+	var err error
+	hfToken := os.Getenv("HUGGINGFACEHUB_API_TOKEN")
+	if hfToken == "" {
+		return fmt.Errorf("no huggingface token provided")
+	}
+	llm.langchain, err = langchain.NewHuggingFace(opts.Model, hfToken)
 	llm.model = opts.Model
-	return nil
+	return err
 }

 func (llm *LLM) Predict(opts *pb.PredictOptions) (string, error) {
--- a/backend/go/stores/main.go
+++ b/backend/go/stores/main.go
@ -19,8 +19,12 @@ func main() {
 	log.Logger = log.Output(zerolog.ConsoleWriter{Out: os.Stderr})

 	flag.Parse()
+	s, err := NewStore()
+	if err != nil {
+		panic(err)
+	}

-	if err := grpc.StartServer(*addr, NewStore()); err != nil {
+	if err := grpc.StartServer(*addr, s); err != nil {
 		panic(err)
 	}
 }
--- a/backend/go/stores/store.go
+++ b/backend/go/stores/store.go
@ -3,505 +3,90 @@ package main
 // This is a wrapper to statisfy the GRPC service interface
 // It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
 import (
-	"container/heap"
+	"context"
 	"fmt"
-	"math"
-	"slices"
+	"strconv"

 	"github.com/go-skynet/LocalAI/pkg/grpc/base"
 	pb "github.com/go-skynet/LocalAI/pkg/grpc/proto"

-	"github.com/rs/zerolog/log"
+	"github.com/philippgille/chromem-go"
 )

 type Store struct {
 	base.SingleThread

-	// The sorted keys
-	keys [][]float32
-	// The sorted values
-	values [][]byte
-
-	// If for every K it holds that ||k||^2 = 1, then we can use the normalized distance functions
-	// TODO: Should we normalize incoming keys if they are not instead?
-	keysAreNormalized bool
-	// The first key decides the length of the keys
-	keyLen int
+	maxId int
+	db *chromem.DB
+	c *chromem.Collection
 }

-// TODO: Only used for sorting using Go's builtin implementation. The interfaces are columnar because
-// that's theoretically best for memory layout and cache locality, but this isn't optimized yet.
-type Pair struct {
-	Key   []float32
-	Value []byte
-}
+func NewStore() (*Store, error) {
+	db := chromem.NewDB()
+	c, err := db.CreateCollection("default", nil, nil)
+	if err != nil {
+		return nil, err
+	}

-func NewStore() *Store {
 	return &Store{
-		keys:              make([][]float32, 0),
-		values:            make([][]byte, 0),
-		keysAreNormalized: true,
-		keyLen:            -1,
-	}
-}
-
-func compareSlices(k1, k2 []float32) int {
-	assert(len(k1) == len(k2), fmt.Sprintf("compareSlices: len(k1) = %d, len(k2) = %d", len(k1), len(k2)))
-
-	return slices.Compare(k1, k2)
-}
-
-func hasKey(unsortedSlice [][]float32, target []float32) bool {
-	return slices.ContainsFunc(unsortedSlice, func(k []float32) bool {
-		return compareSlices(k, target) == 0
-	})
-}
-
-func findInSortedSlice(sortedSlice [][]float32, target []float32) (int, bool) {
-	return slices.BinarySearchFunc(sortedSlice, target, func(k, t []float32) int {
-		return compareSlices(k, t)
-	})
-}
-
-func isSortedPairs(kvs []Pair) bool {
-	for i := 1; i < len(kvs); i++ {
-		if compareSlices(kvs[i-1].Key, kvs[i].Key) > 0 {
-			return false
-		}
-	}
-
-	return true
-}
-
-func isSortedKeys(keys [][]float32) bool {
-	for i := 1; i < len(keys); i++ {
-		if compareSlices(keys[i-1], keys[i]) > 0 {
-			return false
-		}
-	}
-
-	return true
-}
-
-func sortIntoKeySlicese(keys []*pb.StoresKey) [][]float32 {
-	ks := make([][]float32, len(keys))
-
-	for i, k := range keys {
-		ks[i] = k.Floats
-	}
-
-	slices.SortFunc(ks, compareSlices)
-
-	assert(len(ks) == len(keys), fmt.Sprintf("len(ks) = %d, len(keys) = %d", len(ks), len(keys)))
-	assert(isSortedKeys(ks), "keys are not sorted")
-
-	return ks
+		db: db,
+		c:  c,
+	}, nil
 }

 func (s *Store) Load(opts *pb.ModelOptions) error {
 	return nil
 }

-// Sort the incoming kvs and merge them with the existing sorted kvs
 func (s *Store) StoresSet(opts *pb.StoresSetOptions) error {
-	if len(opts.Keys) == 0 {
-		return fmt.Errorf("no keys to add")
+	ids := make([]string, len(opts.Keys))
+
+	for i, _ := range(ids) {
+		ids[i] = strconv.Itoa(i)
 	}

-	if len(opts.Keys) != len(opts.Values) {
-		return fmt.Errorf("len(keys) = %d, len(values) = %d", len(opts.Keys), len(opts.Values))
+	embeddings := make([][]float32, len(opts.Keys))
+
+	for i, key := range opts.Keys {
+		embeddings[i] = key.Floats
 	}

-	if s.keyLen == -1 {
-		s.keyLen = len(opts.Keys[0].Floats)
-	} else {
-		if len(opts.Keys[0].Floats) != s.keyLen {
-			return fmt.Errorf("Try to add key with length %d when existing length is %d", len(opts.Keys[0].Floats), s.keyLen)
-		}
+	contents := make([]string, len(opts.Values))
+
+	for i, value := range opts.Values {
+		contents[i] = string(value.Bytes)
 	}

-	kvs := make([]Pair, len(opts.Keys))
-
-	for i, k := range opts.Keys {
-		if s.keysAreNormalized && !isNormalized(k.Floats) {
-			s.keysAreNormalized = false
-			var sample []float32
-			if len(s.keys) > 5 {
-				sample = k.Floats[:5]
-			} else {
-				sample = k.Floats
-			}
-			log.Debug().Msgf("Key is not normalized: %v", sample)
-		}
-
-		kvs[i] = Pair{
-			Key:   k.Floats,
-			Value: opts.Values[i].Bytes,
-		}
-	}
-
-	slices.SortFunc(kvs, func(a, b Pair) int {
-		return compareSlices(a.Key, b.Key)
-	})
-
-	assert(len(kvs) == len(opts.Keys), fmt.Sprintf("len(kvs) = %d, len(opts.Keys) = %d", len(kvs), len(opts.Keys)))
-	assert(isSortedPairs(kvs), "keys are not sorted")
-
-	l := len(kvs) + len(s.keys)
-	merge_ks := make([][]float32, 0, l)
-	merge_vs := make([][]byte, 0, l)
-
-	i, j := 0, 0
-	for {
-		if i+j >= l {
-			break
-		}
-
-		if i >= len(kvs) {
-			merge_ks = append(merge_ks, s.keys[j])
-			merge_vs = append(merge_vs, s.values[j])
-			j++
-			continue
-		}
-
-		if j >= len(s.keys) {
-			merge_ks = append(merge_ks, kvs[i].Key)
-			merge_vs = append(merge_vs, kvs[i].Value)
-			i++
-			continue
-		}
-
-		c := compareSlices(kvs[i].Key, s.keys[j])
-		if c < 0 {
-			merge_ks = append(merge_ks, kvs[i].Key)
-			merge_vs = append(merge_vs, kvs[i].Value)
-			i++
-		} else if c > 0 {
-			merge_ks = append(merge_ks, s.keys[j])
-			merge_vs = append(merge_vs, s.values[j])
-			j++
-		} else {
-			merge_ks = append(merge_ks, kvs[i].Key)
-			merge_vs = append(merge_vs, kvs[i].Value)
-			i++
-			j++
-		}
-	}
-
-	assert(len(merge_ks) == l, fmt.Sprintf("len(merge_ks) = %d, l = %d", len(merge_ks), l))
-	assert(isSortedKeys(merge_ks), "merge keys are not sorted")
-
-	s.keys = merge_ks
-	s.values = merge_vs
-
-	return nil
+	return s.c.Add(context.Background(), ids, embeddings, nil, contents)
 }

 func (s *Store) StoresDelete(opts *pb.StoresDeleteOptions) error {
-	if len(opts.Keys) == 0 {
-		return fmt.Errorf("no keys to delete")
-	}
-
-	if len(opts.Keys) == 0 {
-		return fmt.Errorf("no keys to add")
-	}
-
-	if s.keyLen == -1 {
-		s.keyLen = len(opts.Keys[0].Floats)
-	} else {
-		if len(opts.Keys[0].Floats) != s.keyLen {
-			return fmt.Errorf("Trying to delete key with length %d when existing length is %d", len(opts.Keys[0].Floats), s.keyLen)
-		}
-	}
-
-	ks := sortIntoKeySlicese(opts.Keys)
-
-	l := len(s.keys) - len(ks)
-	merge_ks := make([][]float32, 0, l)
-	merge_vs := make([][]byte, 0, l)
-
-	tail_ks := s.keys
-	tail_vs := s.values
-	for _, k := range ks {
-		j, found := findInSortedSlice(tail_ks, k)
-
-		if found {
-			merge_ks = append(merge_ks, tail_ks[:j]...)
-			merge_vs = append(merge_vs, tail_vs[:j]...)
-			tail_ks = tail_ks[j+1:]
-			tail_vs = tail_vs[j+1:]
-		} else {
-			assert(!hasKey(s.keys, k), fmt.Sprintf("Key exists, but was not found: t=%d, %v", len(tail_ks), k))
-		}
-
-		log.Debug().Msgf("Delete: found = %v, t = %d, j = %d, len(merge_ks) = %d, len(merge_vs) = %d", found, len(tail_ks), j, len(merge_ks), len(merge_vs))
-	}
-
-	merge_ks = append(merge_ks, tail_ks...)
-	merge_vs = append(merge_vs, tail_vs...)
-
-	assert(len(merge_ks) <= len(s.keys), fmt.Sprintf("len(merge_ks) = %d, len(s.keys) = %d", len(merge_ks), len(s.keys)))
-
-	s.keys = merge_ks
-	s.values = merge_vs
-
-	assert(len(s.keys) >= l, fmt.Sprintf("len(s.keys) = %d, l = %d", len(s.keys), l))
-	assert(isSortedKeys(s.keys), "keys are not sorted")
-	assert(func() bool {
-		for _, k := range ks {
-			if _, found := findInSortedSlice(s.keys, k); found {
-				return false
-			}
-		}
-		return true
-	}(), "Keys to delete still present")
-
-	if len(s.keys) != l {
-		log.Debug().Msgf("Delete: Some keys not found: len(s.keys) = %d, l = %d", len(s.keys), l)
-	}
-
-	return nil
+	return fmt.Errorf("Per document delete not implemented in chromem")
 }

 func (s *Store) StoresGet(opts *pb.StoresGetOptions) (pb.StoresGetResult, error) {
-	pbKeys := make([]*pb.StoresKey, 0, len(opts.Keys))
-	pbValues := make([]*pb.StoresValue, 0, len(opts.Keys))
-	ks := sortIntoKeySlicese(opts.Keys)
-
-	if len(s.keys) == 0 {
-		log.Debug().Msgf("Get: No keys in store")
-	}
-
-	if s.keyLen == -1 {
-		s.keyLen = len(opts.Keys[0].Floats)
-	} else {
-		if len(opts.Keys[0].Floats) != s.keyLen {
-			return pb.StoresGetResult{}, fmt.Errorf("Try to get a key with length %d when existing length is %d", len(opts.Keys[0].Floats), s.keyLen)
-		}
-	}
-
-	tail_k := s.keys
-	tail_v := s.values
-	for i, k := range ks {
-		j, found := findInSortedSlice(tail_k, k)
-
-		if found {
-			pbKeys = append(pbKeys, &pb.StoresKey{
-				Floats: k,
-			})
-			pbValues = append(pbValues, &pb.StoresValue{
-				Bytes: tail_v[j],
-			})
-
-			tail_k = tail_k[j+1:]
-			tail_v = tail_v[j+1:]
-		} else {
-			assert(!hasKey(s.keys, k), fmt.Sprintf("Key exists, but was not found: i=%d, %v", i, k))
-		}
-	}
-
-	if len(pbKeys) != len(opts.Keys) {
-		log.Debug().Msgf("Get: Some keys not found: len(pbKeys) = %d, len(opts.Keys) = %d, len(s.Keys) = %d", len(pbKeys), len(opts.Keys), len(s.keys))
-	}
-
-	return pb.StoresGetResult{
-		Keys:   pbKeys,
-		Values: pbValues,
-	}, nil
-}
-
-func isNormalized(k []float32) bool {
-	var sum float32
-	for _, v := range k {
-		sum += v
-	}
-
-	return sum == 1.0
-}
-
-// TODO: This we could replace with handwritten SIMD code
-func normalizedCosineSimilarity(k1, k2 []float32) float32 {
-	assert(len(k1) == len(k2), fmt.Sprintf("normalizedCosineSimilarity: len(k1) = %d, len(k2) = %d", len(k1), len(k2)))
-
-	var dot float32
-	for i := 0; i < len(k1); i++ {
-		dot += k1[i] * k2[i]
-	}
-
-	assert(dot >= -1 && dot <= 1, fmt.Sprintf("dot = %f", dot))
-
-	// 2.0 * (1.0 - dot) would be the Euclidean distance
-	return dot
-}
-
-type PriorityItem struct {
-	Similarity float32
-	Key        []float32
-	Value      []byte
-}
-
-type PriorityQueue []*PriorityItem
-
-func (pq PriorityQueue) Len() int { return len(pq) }
-
-func (pq PriorityQueue) Less(i, j int) bool {
-	// Inverted because the most similar should be at the top
-	return pq[i].Similarity < pq[j].Similarity
-}
-
-func (pq PriorityQueue) Swap(i, j int) {
-	pq[i], pq[j] = pq[j], pq[i]
-}
-
-func (pq *PriorityQueue) Push(x any) {
-	item := x.(*PriorityItem)
-	*pq = append(*pq, item)
-}
-
-func (pq *PriorityQueue) Pop() any {
-	old := *pq
-	n := len(old)
-	item := old[n-1]
-	*pq = old[0 : n-1]
-	return item
-}
-
-func (s *Store) StoresFindNormalized(opts *pb.StoresFindOptions) (pb.StoresFindResult, error) {
-	tk := opts.Key.Floats
-	top_ks := make(PriorityQueue, 0, int(opts.TopK))
-	heap.Init(&top_ks)
-
-	for i, k := range s.keys {
-		sim := normalizedCosineSimilarity(tk, k)
-		heap.Push(&top_ks, &PriorityItem{
-			Similarity: sim,
-			Key:        k,
-			Value:      s.values[i],
-		})
-
-		if top_ks.Len() > int(opts.TopK) {
-			heap.Pop(&top_ks)
-		}
-	}
-
-	similarities := make([]float32, top_ks.Len())
-	pbKeys := make([]*pb.StoresKey, top_ks.Len())
-	pbValues := make([]*pb.StoresValue, top_ks.Len())
-
-	for i := top_ks.Len() - 1; i >= 0; i-- {
-		item := heap.Pop(&top_ks).(*PriorityItem)
-
-		similarities[i] = item.Similarity
-		pbKeys[i] = &pb.StoresKey{
-			Floats: item.Key,
-		}
-		pbValues[i] = &pb.StoresValue{
-			Bytes: item.Value,
-		}
-	}
-
-	return pb.StoresFindResult{
-		Keys:         pbKeys,
-		Values:       pbValues,
-		Similarities: similarities,
-	}, nil
-}
-
-func cosineSimilarity(k1, k2 []float32, mag1 float64) float32 {
-	assert(len(k1) == len(k2), fmt.Sprintf("cosineSimilarity: len(k1) = %d, len(k2) = %d", len(k1), len(k2)))
-
-	var dot, mag2 float64
-	for i := 0; i < len(k1); i++ {
-		dot += float64(k1[i] * k2[i])
-		mag2 += float64(k2[i] * k2[i])
-	}
-
-	sim := float32(dot / (mag1 * math.Sqrt(mag2)))
-
-	assert(sim >= -1 && sim <= 1, fmt.Sprintf("sim = %f", sim))
-
-	return sim
-}
-
-func (s *Store) StoresFindFallback(opts *pb.StoresFindOptions) (pb.StoresFindResult, error) {
-	tk := opts.Key.Floats
-	top_ks := make(PriorityQueue, 0, int(opts.TopK))
-	heap.Init(&top_ks)
-
-	var mag1 float64
-	for _, v := range tk {
-		mag1 += float64(v * v)
-	}
-	mag1 = math.Sqrt(mag1)
-
-	for i, k := range s.keys {
-		dist := cosineSimilarity(tk, k, mag1)
-		heap.Push(&top_ks, &PriorityItem{
-			Similarity: dist,
-			Key:        k,
-			Value:      s.values[i],
-		})
-
-		if top_ks.Len() > int(opts.TopK) {
-			heap.Pop(&top_ks)
-		}
-	}
-
-	similarities := make([]float32, top_ks.Len())
-	pbKeys := make([]*pb.StoresKey, top_ks.Len())
-	pbValues := make([]*pb.StoresValue, top_ks.Len())
-
-	for i := top_ks.Len() - 1; i >= 0; i-- {
-		item := heap.Pop(&top_ks).(*PriorityItem)
-
-		similarities[i] = item.Similarity
-		pbKeys[i] = &pb.StoresKey{
-			Floats: item.Key,
-		}
-		pbValues[i] = &pb.StoresValue{
-			Bytes: item.Value,
-		}
-	}
-
-	return pb.StoresFindResult{
-		Keys:         pbKeys,
-		Values:       pbValues,
-		Similarities: similarities,
-	}, nil
+	return pb.StoresGetResult{}, fmt.Errorf("Get not really implemented in chromem, although query may work")
 }

 func (s *Store) StoresFind(opts *pb.StoresFindOptions) (pb.StoresFindResult, error) {
-	tk := opts.Key.Floats
-
-	if len(tk) != s.keyLen {
-		return pb.StoresFindResult{}, fmt.Errorf("Try to find key with length %d when existing length is %d", len(tk), s.keyLen)
+	res, err := s.c.QueryEmbedding(context.Background(), opts.Key.Floats, int(opts.TopK), nil, nil)
+	if err != nil {
+		return pb.StoresFindResult{}, err
 	}

-	if opts.TopK < 1 {
-		return pb.StoresFindResult{}, fmt.Errorf("opts.TopK = %d, must be >= 1", opts.TopK)
+	keys := make([]*pb.StoresKey, len(res))
+	values := make([]*pb.StoresValue, len(res))
+	similarities := make([]float32, len(res))
+
+	for i, r := range(res) {
+		keys[i] = &pb.StoresKey{Floats: r.Embedding}
+		similarities[i] = r.Similarity
+		values[i] = &pb.StoresValue{Bytes: []byte(r.Content)}
 	}

-	if s.keyLen == -1 {
-		s.keyLen = len(opts.Key.Floats)
-	} else {
-		if len(opts.Key.Floats) != s.keyLen {
-			return pb.StoresFindResult{}, fmt.Errorf("Try to add key with length %d when existing length is %d", len(opts.Key.Floats), s.keyLen)
-		}
-	}
-
-	if s.keysAreNormalized && isNormalized(tk) {
-		return s.StoresFindNormalized(opts)
-	} else {
-		if s.keysAreNormalized {
-			var sample []float32
-			if len(s.keys) > 5 {
-				sample = tk[:5]
-			} else {
-				sample = tk
-			}
-			log.Debug().Msgf("Trying to compare non-normalized key with normalized keys: %v", sample)
-		}
-
-		return s.StoresFindFallback(opts)
-	}
+	return pb.StoresFindResult{
+		Keys: keys,
+		Values: values,
+		Similarities: similarities,
+	}, nil
 }
--- a/backend/python/transformers/transformers_server.py
+++ b/backend/python/transformers/transformers_server.py
@ -89,8 +89,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
        quantization = None

        if self.CUDA:
-            if request.Device:
-                device_map=request.Device
+            if request.MainGPU:
+                device_map=request.MainGPU
            else:
                device_map="cuda:0"
            if request.Quantization == "bnb_4bit":
@ -143,28 +143,48 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
                from optimum.intel.openvino import OVModelForCausalLM
                from openvino.runtime import Core

-                if "GPU" in Core().available_devices:
-                    device_map="GPU"
+                if request.MainGPU:
+                    device_map=request.MainGPU
                else:
-                    device_map="CPU"
+                    device_map="AUTO"
+                    devices = Core().available_devices
+                    if "GPU" in " ".join(devices):
+                        device_map="AUTO:GPU"
+                # While working on a fine tuned model, inference may give an inaccuracy and performance drop on GPU if winograd convolutions are selected. 
+                # https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html
+                if "CPU" or "NPU" in device_map:
+                    if "-CPU" or "-NPU" not in device_map:
+                        ovconfig={"PERFORMANCE_HINT": "CUMULATIVE_THROUGHPUT"}
+                else:
+                    ovconfig={"PERFORMANCE_HINT": "CUMULATIVE_THROUGHPUT","GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}
                self.model = OVModelForCausalLM.from_pretrained(model_name, 
                                                                compile=True,
                                                                trust_remote_code=request.TrustRemoteCode,
-                                                                ov_config={"PERFORMANCE_HINT": "CUMULATIVE_THROUGHPUT"}, 
+                                                                ov_config=ovconfig,
                                                                device=device_map)
                self.OV = True
            elif request.Type == "OVModelForFeatureExtraction":
                from optimum.intel.openvino import OVModelForFeatureExtraction
                from openvino.runtime import Core

-                if "GPU" in Core().available_devices:
-                    device_map="GPU"
+                if request.MainGPU:
+                    device_map=request.MainGPU
                else:
-                    device_map="CPU"
+                    device_map="AUTO"
+                    devices = Core().available_devices
+                    if "GPU" in " ".join(devices):
+                        device_map="AUTO:GPU"
+                # While working on a fine tuned model, inference may give an inaccuracy and performance drop on GPU if winograd convolutions are selected. 
+                # https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes/gpu-device.html
+                if "CPU" or "NPU" in device_map:
+                    if "-CPU" or "-NPU" not in device_map:
+                        ovconfig={"PERFORMANCE_HINT": "CUMULATIVE_THROUGHPUT"}
+                else:
+                    ovconfig={"PERFORMANCE_HINT": "CUMULATIVE_THROUGHPUT","GPU_DISABLE_WINOGRAD_CONVOLUTION": "YES"}
                self.model = OVModelForFeatureExtraction.from_pretrained(model_name, 
                                                                compile=True,
                                                                trust_remote_code=request.TrustRemoteCode,
-                                                                ov_config={"PERFORMANCE_HINT": "CUMULATIVE_THROUGHPUT"}, 
+                                                                ov_config=ovconfig, 
                                                                export=True,
                                                                device=device_map)
                self.OV = True
@ -226,8 +246,8 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):

        # Pool to get sentence embeddings; i.e. generate one 1024 vector for the entire sentence
        sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
-        print("Calculated embeddings for: " + request.Embeddings, file=sys.stderr)
-        print("Embeddings:", sentence_embeddings, file=sys.stderr)
+#        print("Calculated embeddings for: " + request.Embeddings, file=sys.stderr)
+#        print("Embeddings:", sentence_embeddings, file=sys.stderr)
        return backend_pb2.EmbeddingResult(embeddings=sentence_embeddings[0])

    async def _predict(self, request, context, streaming=False): 
@ -371,4 +391,4 @@ if __name__ == "__main__":
    )
    args = parser.parse_args()

-    asyncio.run(serve(args.addr))
+    asyncio.run(serve(args.addr))
--- a/core/cli/run.go
+++ b/core/cli/run.go
@ -42,7 +42,7 @@ type RunCMD struct {
 	CORSAllowOrigins string   `env:"LOCALAI_CORS_ALLOW_ORIGINS,CORS_ALLOW_ORIGINS" group:"api"`
 	UploadLimit      int      `env:"LOCALAI_UPLOAD_LIMIT,UPLOAD_LIMIT" default:"15" help:"Default upload-limit in MB" group:"api"`
 	APIKeys          []string `env:"LOCALAI_API_KEY,API_KEY" help:"List of API Keys to enable API authentication. When this is set, all the requests must be authenticated with one of these API keys" group:"api"`
-	DisableWelcome   bool     `env:"LOCALAI_DISABLE_WELCOME,DISABLE_WELCOME" default:"false" help:"Disable welcome pages" group:"api"`
+	DisableWebUI     bool     `env:"LOCALAI_DISABLE_WEBUI,DISABLE_WEBUI" default:"false" help:"Disable webui" group:"api"`

 	ParallelRequests     bool     `env:"LOCALAI_PARALLEL_REQUESTS,PARALLEL_REQUESTS" help:"Enable backends to handle multiple requests in parallel if they support it (e.g.: llama.cpp or vllm)" group:"backends"`
 	SingleActiveBackend  bool     `env:"LOCALAI_SINGLE_ACTIVE_BACKEND,SINGLE_ACTIVE_BACKEND" help:"Allow only one backend to be run at a time" group:"backends"`
@ -84,8 +84,8 @@ func (r *RunCMD) Run(ctx *Context) error {
 	idleWatchDog := r.EnableWatchdogIdle
 	busyWatchDog := r.EnableWatchdogBusy

-	if r.DisableWelcome {
-		opts = append(opts, config.DisableWelcomePage)
+	if r.DisableWebUI {
+		opts = append(opts, config.DisableWebUI)
 	}

 	if idleWatchDog || busyWatchDog {
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@ -15,7 +15,7 @@ type ApplicationConfig struct {
 	ConfigFile                          string
 	ModelPath                           string
 	UploadLimitMB, Threads, ContextSize int
-	DisableWelcomePage                  bool
+	DisableWebUI                        bool
 	F16                                 bool
 	Debug                               bool
 	ImageDir                            string
@ -107,8 +107,8 @@ var EnableWatchDogBusyCheck = func(o *ApplicationConfig) {
 	o.WatchDogBusy = true
 }

-var DisableWelcomePage = func(o *ApplicationConfig) {
-	o.DisableWelcomePage = true
+var DisableWebUI = func(o *ApplicationConfig) {
+	o.DisableWebUI = true
 }

 func SetWatchDogBusyTimeout(t time.Duration) AppOption {
--- a/core/config/backend_config_loader.go
+++ b/core/config/backend_config_loader.go
@ -182,6 +182,12 @@ func (cl *BackendConfigLoader) GetAllBackendConfigs() []BackendConfig {
 	return res
 }

+func (cl *BackendConfigLoader) RemoveBackendConfig(m string) {
+	cl.Lock()
+	defer cl.Unlock()
+	delete(cl.configs, m)
+}
+
 func (cl *BackendConfigLoader) ListBackendConfigs() []string {
 	cl.Lock()
 	defer cl.Unlock()
--- a/core/http/app.go
+++ b/core/http/app.go
@ -1,7 +1,9 @@
 package http

 import (
+	"embed"
 	"errors"
+	"net/http"
 	"strings"

 	"github.com/go-skynet/LocalAI/pkg/utils"
@ -18,6 +20,8 @@ import (
 	"github.com/gofiber/contrib/fiberzerolog"
 	"github.com/gofiber/fiber/v2"
 	"github.com/gofiber/fiber/v2/middleware/cors"
+	"github.com/gofiber/fiber/v2/middleware/favicon"
+	"github.com/gofiber/fiber/v2/middleware/filesystem"
 	"github.com/gofiber/fiber/v2/middleware/recover"

 	// swagger handler
@ -42,6 +46,11 @@ func readAuthHeader(c *fiber.Ctx) string {
 	return authHeader
 }

+// Embed a directory
+//
+//go:embed static/*
+var embedDirStatic embed.FS
+
 // @title LocalAI API
 // @version 2.0.0
 // @description The LocalAI Rest API.
@ -169,10 +178,25 @@ func App(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *confi
 	routes.RegisterElevenLabsRoutes(app, cl, ml, appConfig, auth)
 	routes.RegisterLocalAIRoutes(app, cl, ml, appConfig, galleryService, auth)
 	routes.RegisterOpenAIRoutes(app, cl, ml, appConfig, auth)
-	routes.RegisterPagesRoutes(app, cl, ml, appConfig, auth)
-	routes.RegisterUIRoutes(app, cl, ml, appConfig, galleryService, auth)
+	if !appConfig.DisableWebUI {
+		routes.RegisterUIRoutes(app, cl, ml, appConfig, galleryService, auth)
+	}
 	routes.RegisterJINARoutes(app, cl, ml, appConfig, auth)

+	httpFS := http.FS(embedDirStatic)
+
+	app.Use(favicon.New(favicon.Config{
+		URL:        "/favicon.ico",
+		FileSystem: httpFS,
+		File:       "static/favicon.ico",
+	}))
+
+	app.Use("/static", filesystem.New(filesystem.Config{
+		Root:       httpFS,
+		PathPrefix: "static",
+		Browse:     true,
+	}))
+
 	// Define a custom 404 handler
 	// Note: keep this at the bottom!
 	app.Use(notFoundHandler)
--- a/core/http/app_test.go
+++ b/core/http/app_test.go
@ -708,10 +708,26 @@ var _ = Describe("API test", func() {
 			// The response should contain an URL
 			Expect(err).ToNot(HaveOccurred(), fmt.Sprint(resp))
 			dat, err := io.ReadAll(resp.Body)
-			Expect(err).ToNot(HaveOccurred(), string(dat))
-			Expect(string(dat)).To(ContainSubstring("http://127.0.0.1:9090/"), string(dat))
-			Expect(string(dat)).To(ContainSubstring(".png"), string(dat))
+			Expect(err).ToNot(HaveOccurred(), "error reading /image/generations response")

+			imgUrlResp := &schema.OpenAIResponse{}
+			err = json.Unmarshal(dat, imgUrlResp)
+			Expect(imgUrlResp.Data).ToNot(Or(BeNil(), BeZero()))
+			imgUrl := imgUrlResp.Data[0].URL
+			Expect(imgUrl).To(ContainSubstring("http://127.0.0.1:9090/"), imgUrl)
+			Expect(imgUrl).To(ContainSubstring(".png"), imgUrl)
+
+			imgResp, err := http.Get(imgUrl)
+			Expect(err).To(BeNil())
+			Expect(imgResp).ToNot(BeNil())
+			Expect(imgResp.StatusCode).To(Equal(200))
+			Expect(imgResp.ContentLength).To(BeNumerically(">", 0))
+			imgData := make([]byte, 512)
+			count, err := io.ReadFull(imgResp.Body, imgData)
+			Expect(err).To(Or(BeNil(), MatchError(io.EOF)))
+			Expect(count).To(BeNumerically(">", 0))
+			Expect(count).To(BeNumerically("<=", 512))
+			Expect(http.DetectContentType(imgData)).To(Equal("image/png"))
 		})
 	})

@ -787,11 +803,11 @@ var _ = Describe("API test", func() {
 		})

 		It("returns errors", func() {
-			backends := len(model.AutoLoadBackends) + 1 // +1 for huggingface
 			_, err := client.CreateCompletion(context.TODO(), openai.CompletionRequest{Model: "foomodel", Prompt: testPrompt})
 			Expect(err).To(HaveOccurred())
-			Expect(err.Error()).To(ContainSubstring(fmt.Sprintf("error, status code: 500, message: could not load model - all backends returned error: %d errors occurred:", backends)))
+			Expect(err.Error()).To(ContainSubstring("error, status code: 500, message: could not load model - all backends returned error:"))
 		})
+
 		It("transcribes audio", func() {
 			if runtime.GOOS != "linux" {
 				Skip("test supported only on linux")
--- a/core/http/elements/gallery.go
+++ b/core/http/elements/gallery.go
@ -2,9 +2,11 @@ package elements

 import (
 	"fmt"
+	"strings"

 	"github.com/chasefleming/elem-go"
 	"github.com/chasefleming/elem-go/attrs"
+	"github.com/go-skynet/LocalAI/core/services"
 	"github.com/go-skynet/LocalAI/pkg/gallery"
 	"github.com/go-skynet/LocalAI/pkg/xsync"
 )
@ -13,7 +15,12 @@ const (
 	NoImage = "https://upload.wikimedia.org/wikipedia/commons/6/65/No-Image-Placeholder.svg"
 )

-func DoneProgress(uid, text string) string {
+func DoneProgress(galleryID, text string, showDelete bool) string {
+	// Split by @ and grab the name
+	if strings.Contains(galleryID, "@") {
+		galleryID = strings.Split(galleryID, "@")[1]
+	}
+
 	return elem.Div(
 		attrs.Props{},
 		elem.H3(
@ -25,10 +32,11 @@ func DoneProgress(uid, text string) string {
 			},
 			elem.Text(text),
 		),
+		elem.If(showDelete, deleteButton(galleryID), reInstallButton(galleryID)),
 	).Render()
 }

-func ErrorProgress(err string) string {
+func ErrorProgress(err, galleryName string) string {
 	return elem.Div(
 		attrs.Props{},
 		elem.H3(
@ -38,8 +46,9 @@ func ErrorProgress(err string) string {
 				"tabindex":  "-1",
 				"autofocus": "",
 			},
-			elem.Text("Error"+err),
+			elem.Text("Error "+err),
 		),
+		installButton(galleryName),
 	).Render()
 }

@ -64,12 +73,13 @@ func StartProgressBar(uid, progress, text string) string {
 	if progress == "" {
 		progress = "0"
 	}
-	return elem.Div(attrs.Props{
-		"hx-trigger": "done",
-		"hx-get":     "/browse/job/" + uid,
-		"hx-swap":    "outerHTML",
-		"hx-target":  "this",
-	},
+	return elem.Div(
+		attrs.Props{
+			"hx-trigger": "done",
+			"hx-get":     "/browse/job/" + uid,
+			"hx-swap":    "innerHTML",
+			"hx-target":  "this",
+		},
 		elem.H3(
 			attrs.Props{
 				"role":      "status",
@ -99,11 +109,123 @@ func cardSpan(text, icon string) elem.Node {
 		elem.I(attrs.Props{
 			"class": icon + " pr-2",
 		}),
+
 		elem.Text(text),
+
+		//elem.Text(text),
 	)
 }

-func ListModels(models []*gallery.GalleryModel, installing *xsync.SyncedMap[string, string]) string {
+func searchableElement(text, icon string) elem.Node {
+	return elem.Form(
+		attrs.Props{},
+		elem.Input(
+			attrs.Props{
+				"type":  "hidden",
+				"name":  "search",
+				"value": text,
+			},
+		),
+		elem.Span(
+			attrs.Props{
+				"class": "inline-block bg-gray-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2",
+			},
+
+			elem.A(
+				attrs.Props{
+					//	"name":      "search",
+					//	"value":     text,
+					//"class":     "inline-block bg-gray-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2",
+					"href":      "#!",
+					"hx-post":   "/browse/search/models",
+					"hx-target": "#search-results",
+					// TODO: this doesn't work
+					//	"hx-vals":      `{ \"search\": \"` + text + `\" }`,
+					"hx-indicator": ".htmx-indicator",
+				},
+				elem.I(attrs.Props{
+					"class": icon + " pr-2",
+				}),
+				elem.Text(text),
+			),
+		),
+
+		//elem.Text(text),
+	)
+}
+
+func link(text, url string) elem.Node {
+	return elem.A(
+		attrs.Props{
+			"class":  "inline-block bg-gray-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2",
+			"href":   url,
+			"target": "_blank",
+		},
+		elem.I(attrs.Props{
+			"class": "fas fa-link pr-2",
+		}),
+		elem.Text(text),
+	)
+}
+func installButton(galleryName string) elem.Node {
+	return elem.Button(
+		attrs.Props{
+			"data-twe-ripple-init":  "",
+			"data-twe-ripple-color": "light",
+			"class":                 "float-right inline-block rounded bg-primary px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong",
+			"hx-swap":               "outerHTML",
+			// post the Model ID as param
+			"hx-post": "/browse/install/model/" + galleryName,
+		},
+		elem.I(
+			attrs.Props{
+				"class": "fa-solid fa-download pr-2",
+			},
+		),
+		elem.Text("Install"),
+	)
+}
+
+func reInstallButton(galleryName string) elem.Node {
+	return elem.Button(
+		attrs.Props{
+			"data-twe-ripple-init":  "",
+			"data-twe-ripple-color": "light",
+			"class":                 "float-right inline-block rounded bg-primary ml-2 px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong",
+			"hx-swap":               "outerHTML",
+			// post the Model ID as param
+			"hx-post": "/browse/install/model/" + galleryName,
+		},
+		elem.I(
+			attrs.Props{
+				"class": "fa-solid fa-arrow-rotate-right pr-2",
+			},
+		),
+		elem.Text("Reinstall"),
+	)
+}
+
+func deleteButton(modelName string) elem.Node {
+	return elem.Button(
+		attrs.Props{
+			"data-twe-ripple-init":  "",
+			"data-twe-ripple-color": "light",
+			"hx-confirm":            "Are you sure you wish to delete the model?",
+			"class":                 "float-right inline-block rounded bg-red-800 px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-red-accent-300 hover:shadow-red-2 focus:bg-red-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-red-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong",
+			"hx-swap":               "outerHTML",
+			// post the Model ID as param
+			"hx-post": "/browse/delete/model/" + modelName,
+		},
+		elem.I(
+			attrs.Props{
+				"class": "fa-solid fa-cancel pr-2",
+			},
+		),
+		elem.Text("Delete"),
+	)
+}
+
+func ListModels(models []*gallery.GalleryModel, processing *xsync.SyncedMap[string, string], galleryService *services.GalleryService) string {
 	//StartProgressBar(uid, "0")
 	modelsElements := []elem.Node{}
 	// span := func(s string) elem.Node {
@ -114,43 +236,6 @@ func ListModels(models []*gallery.GalleryModel, installing *xsync.SyncedMap[stri
 	// 		elem.Text(s),
 	// 	)
 	// }
-	deleteButton := func(m *gallery.GalleryModel) elem.Node {
-		return elem.Button(
-			attrs.Props{
-				"data-twe-ripple-init":  "",
-				"data-twe-ripple-color": "light",
-				"class":                 "float-right inline-block rounded bg-red-800 px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-red-accent-300 hover:shadow-red-2 focus:bg-red-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-red-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong",
-				"hx-swap":               "outerHTML",
-				// post the Model ID as param
-				"hx-post": "/browse/delete/model/" + m.Name,
-			},
-			elem.I(
-				attrs.Props{
-					"class": "fa-solid fa-cancel pr-2",
-				},
-			),
-			elem.Text("Delete"),
-		)
-	}
-
-	installButton := func(m *gallery.GalleryModel) elem.Node {
-		return elem.Button(
-			attrs.Props{
-				"data-twe-ripple-init":  "",
-				"data-twe-ripple-color": "light",
-				"class":                 "float-right inline-block rounded bg-primary px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong",
-				"hx-swap":               "outerHTML",
-				// post the Model ID as param
-				"hx-post": "/browse/install/model/" + fmt.Sprintf("%s@%s", m.Gallery.Name, m.Name),
-			},
-			elem.I(
-				attrs.Props{
-					"class": "fa-solid fa-download pr-2",
-				},
-			),
-			elem.Text("Install"),
-		)
-	}

 	descriptionDiv := func(m *gallery.GalleryModel) elem.Node {

@ -175,7 +260,15 @@ func ListModels(models []*gallery.GalleryModel, installing *xsync.SyncedMap[stri

 	actionDiv := func(m *gallery.GalleryModel) elem.Node {
 		galleryID := fmt.Sprintf("%s@%s", m.Gallery.Name, m.Name)
-		currentlyInstalling := installing.Exists(galleryID)
+		currentlyProcessing := processing.Exists(galleryID)
+		isDeletionOp := false
+		if currentlyProcessing {
+			status := galleryService.GetStatus(galleryID)
+			if status != nil && status.Deletion {
+				isDeletionOp = true
+			}
+			// if status == nil : "Waiting"
+		}

 		nodes := []elem.Node{
 			cardSpan("Repository: "+m.Gallery.Name, "fa-brands fa-git-alt"),
@ -187,25 +280,31 @@ func ListModels(models []*gallery.GalleryModel, installing *xsync.SyncedMap[stri
 			)
 		}

+		tagsNodes := []elem.Node{}
 		for _, tag := range m.Tags {
-			nodes = append(nodes,
-				cardSpan(tag, "fas fa-tag"),
+			tagsNodes = append(tagsNodes,
+				searchableElement(tag, "fas fa-tag"),
 			)
 		}

+		nodes = append(nodes,
+			elem.Div(
+				attrs.Props{
+					"class": "flex flex-row flex-wrap content-center",
+				},
+				tagsNodes...,
+			),
+		)
+
 		for i, url := range m.URLs {
 			nodes = append(nodes,
-				elem.A(
-					attrs.Props{
-						"class":  "inline-block bg-gray-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2",
-						"href":   url,
-						"target": "_blank",
-					},
-					elem.I(attrs.Props{
-						"class": "fas fa-link pr-2",
-					}),
-					elem.Text("Link #"+fmt.Sprintf("%d", i+1)),
-				))
+				link("Link #"+fmt.Sprintf("%d", i+1), url),
+			)
+		}
+
+		progressMessage := "Installation"
+		if isDeletionOp {
+			progressMessage = "Deletion"
 		}

 		return elem.Div(
@ -219,17 +318,17 @@ func ListModels(models []*gallery.GalleryModel, installing *xsync.SyncedMap[stri
 				nodes...,
 			),
 			elem.If(
-				currentlyInstalling,
+				currentlyProcessing,
 				elem.Node( // If currently installing, show progress bar
-					elem.Raw(StartProgressBar(installing.Get(galleryID), "0", "Installing")),
+					elem.Raw(StartProgressBar(processing.Get(galleryID), "0", progressMessage)),
 				), // Otherwise, show install button (if not installed) or display "Installed"
 				elem.If(m.Installed,
-					//elem.Node(elem.Div(
-					//		attrs.Props{},
-					//	span("Installed"), deleteButton(m),
-					//	)),
-					deleteButton(m),
-					installButton(m),
+					elem.Node(elem.Div(
+						attrs.Props{},
+						reInstallButton(m.ID()),
+						deleteButton(m.Name),
+					)),
+					installButton(m.ID()),
 				),
 			),
 		)
@ -243,11 +342,13 @@ func ListModels(models []*gallery.GalleryModel, installing *xsync.SyncedMap[stri
 			m.Icon = NoImage
 		}

+		divProperties := attrs.Props{
+			"class": "flex justify-center items-center",
+		}
+
 		elems = append(elems,

-			elem.Div(attrs.Props{
-				"class": "flex justify-center items-center",
-			},
+			elem.Div(divProperties,
 				elem.A(attrs.Props{
 					"href": "#!",
 					//		"class": "justify-center items-center",
@ -260,6 +361,19 @@ func ListModels(models []*gallery.GalleryModel, installing *xsync.SyncedMap[stri
 				),
 			))

+		_, trustRemoteCodeExists := m.Overrides["trust_remote_code"]
+		if trustRemoteCodeExists {
+			elems = append(elems, elem.Div(
+				attrs.Props{
+					"class": "flex justify-center items-center bg-red-500 text-white p-2 rounded-lg mt-2",
+				},
+				elem.I(attrs.Props{
+					"class": "fa-solid fa-circle-exclamation pr-2",
+				}),
+				elem.Text("Attention: Trust Remote Code is required for this model"),
+			))
+		}
+
 		elems = append(elems, descriptionDiv(m), actionDiv(m))
 		modelsElements = append(modelsElements,
 			elem.Div(
--- a/core/http/endpoints/localai/gallery.go
+++ b/core/http/endpoints/localai/gallery.go
@ -61,11 +61,11 @@ func (mgs *ModelGalleryEndpointService) ApplyModelGalleryEndpoint() func(c *fibe
 			return err
 		}
 		mgs.galleryApplier.C <- gallery.GalleryOp{
-			Req:         input.GalleryModel,
-			Id:          uuid.String(),
-			GalleryName: input.ID,
-			Galleries:   mgs.galleries,
-			ConfigURL:   input.ConfigURL,
+			Req:              input.GalleryModel,
+			Id:               uuid.String(),
+			GalleryModelName: input.ID,
+			Galleries:        mgs.galleries,
+			ConfigURL:        input.ConfigURL,
 		}
 		return c.JSON(struct {
 			ID        string `json:"uuid"`
@ -79,8 +79,8 @@ func (mgs *ModelGalleryEndpointService) DeleteModelGalleryEndpoint() func(c *fib
 		modelName := c.Params("name")

 		mgs.galleryApplier.C <- gallery.GalleryOp{
-			Delete:      true,
-			GalleryName: modelName,
+			Delete:           true,
+			GalleryModelName: modelName,
 		}

 		uuid, err := uuid.NewUUID()
--- a/core/http/endpoints/localai/welcome.go
+++ b/core/http/endpoints/localai/welcome.go
@ -3,22 +3,39 @@ package localai
 import (
 	"github.com/go-skynet/LocalAI/core/config"
 	"github.com/go-skynet/LocalAI/internal"
+	"github.com/go-skynet/LocalAI/pkg/gallery"
 	"github.com/go-skynet/LocalAI/pkg/model"
 	"github.com/gofiber/fiber/v2"
 )

 func WelcomeEndpoint(appConfig *config.ApplicationConfig,
-	cl *config.BackendConfigLoader, ml *model.ModelLoader) func(*fiber.Ctx) error {
+	cl *config.BackendConfigLoader, ml *model.ModelLoader, modelStatus func() (map[string]string, map[string]string)) func(*fiber.Ctx) error {
 	return func(c *fiber.Ctx) error {
 		models, _ := ml.ListModels()
 		backendConfigs := cl.GetAllBackendConfigs()

+		galleryConfigs := map[string]*gallery.Config{}
+		for _, m := range backendConfigs {
+
+			cfg, err := gallery.GetLocalModelConfiguration(ml.ModelPath, m.Name)
+			if err != nil {
+				continue
+			}
+			galleryConfigs[m.Name] = cfg
+		}
+
+		// Get model statuses to display in the UI the operation in progress
+		processingModels, taskTypes := modelStatus()
+
 		summary := fiber.Map{
 			"Title":             "LocalAI API - " + internal.PrintableVersion(),
 			"Version":           internal.PrintableVersion(),
 			"Models":            models,
 			"ModelsConfig":      backendConfigs,
+			"GalleryConfig":     galleryConfigs,
 			"ApplicationConfig": appConfig,
+			"ProcessingModels":  processingModels,
+			"TaskTypes":         taskTypes,
 		}

 		if string(c.Context().Request.Header.ContentType()) == "application/json" || len(c.Accepts("html")) == 0 {
--- a/core/http/endpoints/openai/request.go
+++ b/core/http/endpoints/openai/request.go
@ -63,10 +63,14 @@ func getBase64Image(s string) (string, error) {
 		return encoded, nil
 	}

-	// if the string instead is prefixed with "data:image/jpeg;base64,", drop it
-	if strings.HasPrefix(s, "data:image/jpeg;base64,") {
-		return strings.ReplaceAll(s, "data:image/jpeg;base64,", ""), nil
+	// if the string instead is prefixed with "data:image/...;base64,", drop it
+	dropPrefix := []string{"data:image/jpeg;base64,", "data:image/png;base64,"}
+	for _, prefix := range dropPrefix {
+		if strings.HasPrefix(s, prefix) {
+			return strings.ReplaceAll(s, prefix, ""), nil
+		}
 	}
+
 	return "", fmt.Errorf("not valid string")
 }

@ -181,7 +185,7 @@ func updateRequestConfig(config *config.BackendConfig, input *schema.OpenAIReque
 						input.Messages[i].StringContent = fmt.Sprintf("[img-%d]", index) + input.Messages[i].StringContent
 						index++
 					} else {
-						fmt.Print("Failed encoding image", err)
+						log.Error().Msgf("Failed encoding image: %s", err)
 					}
 				}
 			}
--- a/core/http/routes/ui.go
+++ b/core/http/routes/ui.go
@ -3,11 +3,14 @@ package routes
 import (
 	"fmt"
 	"html/template"
+	"sort"
 	"strings"

 	"github.com/go-skynet/LocalAI/core/config"
 	"github.com/go-skynet/LocalAI/core/http/elements"
+	"github.com/go-skynet/LocalAI/core/http/endpoints/localai"
 	"github.com/go-skynet/LocalAI/core/services"
+	"github.com/go-skynet/LocalAI/internal"
 	"github.com/go-skynet/LocalAI/pkg/gallery"
 	"github.com/go-skynet/LocalAI/pkg/model"
 	"github.com/go-skynet/LocalAI/pkg/xsync"
@ -24,16 +27,64 @@ func RegisterUIRoutes(app *fiber.App,
 	auth func(*fiber.Ctx) error) {

 	// keeps the state of models that are being installed from the UI
-	var installingModels = xsync.NewSyncedMap[string, string]()
+	var processingModels = xsync.NewSyncedMap[string, string]()
+
+	// modelStatus returns the current status of the models being processed (installation or deletion)
+	// it is called asynchonously from the UI
+	modelStatus := func() (map[string]string, map[string]string) {
+		processingModelsData := processingModels.Map()
+
+		taskTypes := map[string]string{}
+
+		for k, v := range processingModelsData {
+			status := galleryService.GetStatus(v)
+			taskTypes[k] = "Installation"
+			if status != nil && status.Deletion {
+				taskTypes[k] = "Deletion"
+			} else if status == nil {
+				taskTypes[k] = "Waiting"
+			}
+		}
+
+		return processingModelsData, taskTypes
+	}
+
+	app.Get("/", auth, localai.WelcomeEndpoint(appConfig, cl, ml, modelStatus))

 	// Show the Models page (all models)
 	app.Get("/browse", auth, func(c *fiber.Ctx) error {
+		term := c.Query("term")
+
 		models, _ := gallery.AvailableGalleryModels(appConfig.Galleries, appConfig.ModelPath)

+		// Get all available tags
+		allTags := map[string]struct{}{}
+		tags := []string{}
+		for _, m := range models {
+			for _, t := range m.Tags {
+				allTags[t] = struct{}{}
+			}
+		}
+		for t := range allTags {
+			tags = append(tags, t)
+		}
+		sort.Strings(tags)
+
+		if term != "" {
+			models = gallery.GalleryModels(models).Search(term)
+		}
+
+		// Get model statuses
+		processingModelsData, taskTypes := modelStatus()
+
 		summary := fiber.Map{
-			"Title":        "LocalAI - Models",
-			"Models":       template.HTML(elements.ListModels(models, installingModels)),
-			"Repositories": appConfig.Galleries,
+			"Title":            "LocalAI - Models",
+			"Version":          internal.PrintableVersion(),
+			"Models":           template.HTML(elements.ListModels(models, processingModels, galleryService)),
+			"Repositories":     appConfig.Galleries,
+			"AllTags":          tags,
+			"ProcessingModels": processingModelsData,
+			"TaskTypes":        taskTypes,
 			//	"ApplicationConfig": appConfig,
 		}

@ -53,17 +104,7 @@ func RegisterUIRoutes(app *fiber.App,

 		models, _ := gallery.AvailableGalleryModels(appConfig.Galleries, appConfig.ModelPath)

-		filteredModels := []*gallery.GalleryModel{}
-		for _, m := range models {
-			if strings.Contains(m.Name, form.Search) ||
-				strings.Contains(m.Description, form.Search) ||
-				strings.Contains(m.Gallery.Name, form.Search) ||
-				strings.Contains(strings.Join(m.Tags, ","), form.Search) {
-				filteredModels = append(filteredModels, m)
-			}
-		}
-
-		return c.SendString(elements.ListModels(filteredModels, installingModels))
+		return c.SendString(elements.ListModels(gallery.GalleryModels(models).Search(form.Search), processingModels, galleryService))
 	})

 	/*
@ -84,12 +125,12 @@ func RegisterUIRoutes(app *fiber.App,

 		uid := id.String()

-		installingModels.Set(galleryID, uid)
+		processingModels.Set(galleryID, uid)

 		op := gallery.GalleryOp{
-			Id:          uid,
-			GalleryName: galleryID,
-			Galleries:   appConfig.Galleries,
+			Id:               uid,
+			GalleryModelName: galleryID,
+			Galleries:        appConfig.Galleries,
 		}
 		go func() {
 			galleryService.C <- op
@ -110,15 +151,16 @@ func RegisterUIRoutes(app *fiber.App,

 		uid := id.String()

-		installingModels.Set(galleryID, uid)
+		processingModels.Set(galleryID, uid)

 		op := gallery.GalleryOp{
-			Id:          uid,
-			Delete:      true,
-			GalleryName: galleryID,
+			Id:               uid,
+			Delete:           true,
+			GalleryModelName: galleryID,
 		}
 		go func() {
 			galleryService.C <- op
+			cl.RemoveBackendConfig(galleryID)
 		}()

 		return c.SendString(elements.StartProgressBar(uid, "0", "Deletion"))
@ -141,7 +183,7 @@ func RegisterUIRoutes(app *fiber.App,
 			return c.SendString(elements.ProgressBar("100"))
 		}
 		if status.Error != nil {
-			return c.SendString(elements.ErrorProgress(status.Error.Error()))
+			return c.SendString(elements.ErrorProgress(status.Error.Error(), status.GalleryModelName))
 		}

 		return c.SendString(elements.ProgressBar(fmt.Sprint(status.Progress)))
@ -153,17 +195,123 @@ func RegisterUIRoutes(app *fiber.App,

 		status := galleryService.GetStatus(c.Params("uid"))

-		for _, k := range installingModels.Keys() {
-			if installingModels.Get(k) == c.Params("uid") {
-				installingModels.Delete(k)
+		galleryID := ""
+		for _, k := range processingModels.Keys() {
+			if processingModels.Get(k) == c.Params("uid") {
+				galleryID = k
+				processingModels.Delete(k)
 			}
 		}

+		showDelete := true
 		displayText := "Installation completed"
 		if status.Deletion {
+			showDelete = false
 			displayText = "Deletion completed"
 		}

-		return c.SendString(elements.DoneProgress(c.Params("uid"), displayText))
+		return c.SendString(elements.DoneProgress(galleryID, displayText, showDelete))
+	})
+
+	// Show the Chat page
+	app.Get("/chat/:model", auth, func(c *fiber.Ctx) error {
+		backendConfigs := cl.GetAllBackendConfigs()
+
+		summary := fiber.Map{
+			"Title":        "LocalAI - Chat with " + c.Params("model"),
+			"ModelsConfig": backendConfigs,
+			"Model":        c.Params("model"),
+			"Version":      internal.PrintableVersion(),
+		}
+
+		// Render index
+		return c.Render("views/chat", summary)
+	})
+	app.Get("/chat/", auth, func(c *fiber.Ctx) error {
+
+		backendConfigs := cl.GetAllBackendConfigs()
+
+		if len(backendConfigs) == 0 {
+			// If no model is available redirect to the index which suggests how to install models
+			return c.Redirect("/")
+		}
+
+		summary := fiber.Map{
+			"Title":        "LocalAI - Chat with " + backendConfigs[0].Name,
+			"ModelsConfig": backendConfigs,
+			"Model":        backendConfigs[0].Name,
+			"Version":      internal.PrintableVersion(),
+		}
+
+		// Render index
+		return c.Render("views/chat", summary)
+	})
+
+	app.Get("/text2image/:model", auth, func(c *fiber.Ctx) error {
+		backendConfigs := cl.GetAllBackendConfigs()
+
+		summary := fiber.Map{
+			"Title":        "LocalAI - Generate images with " + c.Params("model"),
+			"ModelsConfig": backendConfigs,
+			"Model":        c.Params("model"),
+			"Version":      internal.PrintableVersion(),
+		}
+
+		// Render index
+		return c.Render("views/text2image", summary)
+	})
+
+	app.Get("/text2image/", auth, func(c *fiber.Ctx) error {
+
+		backendConfigs := cl.GetAllBackendConfigs()
+
+		if len(backendConfigs) == 0 {
+			// If no model is available redirect to the index which suggests how to install models
+			return c.Redirect("/")
+		}
+
+		summary := fiber.Map{
+			"Title":        "LocalAI - Generate images with " + backendConfigs[0].Name,
+			"ModelsConfig": backendConfigs,
+			"Model":        backendConfigs[0].Name,
+			"Version":      internal.PrintableVersion(),
+		}
+
+		// Render index
+		return c.Render("views/text2image", summary)
+	})
+
+	app.Get("/tts/:model", auth, func(c *fiber.Ctx) error {
+		backendConfigs := cl.GetAllBackendConfigs()
+
+		summary := fiber.Map{
+			"Title":        "LocalAI - Generate images with " + c.Params("model"),
+			"ModelsConfig": backendConfigs,
+			"Model":        c.Params("model"),
+			"Version":      internal.PrintableVersion(),
+		}
+
+		// Render index
+		return c.Render("views/tts", summary)
+	})
+
+	app.Get("/tts/", auth, func(c *fiber.Ctx) error {
+
+		backendConfigs := cl.GetAllBackendConfigs()
+
+		if len(backendConfigs) == 0 {
+			// If no model is available redirect to the index which suggests how to install models
+			return c.Redirect("/")
+		}
+
+		summary := fiber.Map{
+			"Title":        "LocalAI - Generate audio with " + backendConfigs[0].Name,
+			"ModelsConfig": backendConfigs,
+			"Model":        backendConfigs[0].Name,
+			"Version":      internal.PrintableVersion(),
+		}
+
+		// Render index
+		return c.Render("views/tts", summary)
 	})
 }
--- a/core/http/routes/welcome.go
+++ b/core/http/routes/welcome.go
@ -1,19 +0,0 @@
-package routes
-
-import (
-	"github.com/go-skynet/LocalAI/core/config"
-	"github.com/go-skynet/LocalAI/core/http/endpoints/localai"
-	"github.com/go-skynet/LocalAI/pkg/model"
-	"github.com/gofiber/fiber/v2"
-)
-
-func RegisterPagesRoutes(app *fiber.App,
-	cl *config.BackendConfigLoader,
-	ml *model.ModelLoader,
-	appConfig *config.ApplicationConfig,
-	auth func(*fiber.Ctx) error) {
-
-	if !appConfig.DisableWelcomePage {
-		app.Get("/", auth, localai.WelcomeEndpoint(appConfig, cl, ml))
-	}
-}
--- a/core/http/static/chat.js
+++ b/core/http/static/chat.js
@ -0,0 +1,238 @@
+/*
+
+https://github.com/david-haerer/chatapi
+
+MIT License
+
+Copyright (c) 2023 David Härer
+Copyright (c) 2024 Ettore Di Giacinto
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+*/
+
+function submitKey(event) {
+    event.preventDefault();
+    localStorage.setItem("key", document.getElementById("apiKey").value);
+    document.getElementById("apiKey").blur();
+}
+
+function submitSystemPrompt(event) {
+  event.preventDefault();
+  localStorage.setItem("system_prompt", document.getElementById("systemPrompt").value);
+  document.getElementById("systemPrompt").blur();
+}
+  
+var image = "";
+
+function submitPrompt(event) {
+  event.preventDefault();
+
+  const input = document.getElementById("input").value;
+  Alpine.store("chat").add("user", input, image);
+  document.getElementById("input").value = "";
+  const key = localStorage.getItem("key");
+  const systemPrompt = localStorage.getItem("system_prompt");
+
+  promptGPT(systemPrompt, key, input);
+}
+
+function readInputImage() {
+  
+  if (!this.files || !this.files[0]) return;
+    
+  const FR = new FileReader();
+    
+  FR.addEventListener("load", function(evt) {
+    image = evt.target.result;
+  }); 
+    
+  FR.readAsDataURL(this.files[0]);
+}
+
+
+  async function promptGPT(systemPrompt, key, input) {
+    const model = document.getElementById("chat-model").value;
+    // Set class "loader" to the element with "loader" id
+    //document.getElementById("loader").classList.add("loader");
+    // Make the "loader" visible
+    document.getElementById("loader").style.display = "block";
+    document.getElementById("input").disabled = true;
+    document.getElementById('messages').scrollIntoView(false)
+
+    messages = Alpine.store("chat").messages();
+
+    // if systemPrompt isn't empty, push it at the start of messages
+    if (systemPrompt) {
+      messages.unshift({
+        role: "system",
+        content: systemPrompt
+      });
+    }
+
+    // loop all messages, and check if there are images. If there are, we need to change the content field
+    messages.forEach((message) => {
+      if (message.image) {
+        // The content field now becomes an array
+        message.content = [
+          {
+            "type": "text",
+            "text": message.content
+          }
+        ]
+        message.content.push(
+          {
+            "type": "image_url",
+            "image_url": {
+              "url": message.image,
+            }
+          }
+        );
+
+        // remove the image field
+        delete message.image;
+      }
+    });
+
+       // reset the form and the image
+       image = "";
+       document.getElementById("input_image").value = null;
+       document.getElementById("fileName").innerHTML = "";
+
+    // if (image) {
+    //   // take the last element content's and add the image
+    //   last_message = messages[messages.length - 1]
+    //   // The content field now becomes an array
+    //   last_message.content = [
+    //     {
+    //       "type": "text",
+    //       "text": last_message.content
+    //     }
+    //    ]
+    //   last_message.content.push(
+    //     {
+    //       "type": "image_url",
+    //       "image_url": {
+    //         "url": image,
+    //       }
+    //     }
+    //   );
+    //   // and we replace it in the messages array
+    //   messages[messages.length - 1] = last_message
+
+    //   // reset the form and the image
+    //   image = "";
+    //   document.getElementById("input_image").value = null;
+    //   document.getElementById("fileName").innerHTML = "";
+    // }
+
+    // Source: https://stackoverflow.com/a/75751803/11386095
+    const response = await fetch("/v1/chat/completions", {
+      method: "POST",
+      headers: {
+        Authorization: `Bearer ${key}`,
+        "Content-Type": "application/json",
+      },
+      body: JSON.stringify({
+        model: model,
+        messages: messages,
+        stream: true,
+      }),
+    });
+  
+    if (!response.ok) {
+      Alpine.store("chat").add(
+        "assistant",
+        `<span class='error'>Error: POST /v1/chat/completions ${response.status}</span>`,
+      );
+      return;
+    }
+  
+    const reader = response.body
+      ?.pipeThrough(new TextDecoderStream())
+      .getReader();
+  
+    if (!reader) {
+      Alpine.store("chat").add(
+        "assistant",
+        `<span class='error'>Error: Failed to decode API response</span>`,
+      );
+      return;
+    }
+  
+    while (true) {
+      const { value, done } = await reader.read();
+      if (done) break;
+      let dataDone = false;
+      const arr = value.split("\n");
+      arr.forEach((data) => {
+        if (data.length === 0) return;
+        if (data.startsWith(":")) return;
+        if (data === "data: [DONE]") {
+          dataDone = true;
+          return;
+        }
+        const token = JSON.parse(data.substring(6)).choices[0].delta.content;
+        if (!token) {
+          return;
+        }
+        hljs.highlightAll();
+        Alpine.store("chat").add("assistant", token);
+        document.getElementById('messages').scrollIntoView(false)
+      });
+      hljs.highlightAll();
+      if (dataDone) break;
+    }
+    // Remove class "loader" from the element with "loader" id
+    //document.getElementById("loader").classList.remove("loader");
+    document.getElementById("loader").style.display = "none";
+    // enable input
+    document.getElementById("input").disabled = false;
+    // scroll to the bottom of the chat
+    document.getElementById('messages').scrollIntoView(false)
+    // set focus to the input
+    document.getElementById("input").focus();
+  }
+  
+  document.getElementById("key").addEventListener("submit", submitKey);
+  document.getElementById("system_prompt").addEventListener("submit", submitSystemPrompt);
+
+  document.getElementById("prompt").addEventListener("submit", submitPrompt);
+  document.getElementById("input").focus();
+  document.getElementById("input_image").addEventListener("change", readInputImage);
+
+  storeKey = localStorage.getItem("key");
+  if (storeKey) {
+    document.getElementById("apiKey").value = storeKey;
+  } else {
+    document.getElementById("apiKey").value = null;
+  }
+
+  storesystemPrompt = localStorage.getItem("system_prompt");
+  if (storesystemPrompt) {
+    document.getElementById("systemPrompt").value = storesystemPrompt;
+  } else {
+    document.getElementById("systemPrompt").value = null;
+  }
+  
+  marked.setOptions({
+    highlight: function (code) {
+      return hljs.highlightAuto(code).value;
+    },
+  });
--- a/core/http/static/favicon.ico
+++ b/core/http/static/favicon.ico
--- a/core/http/static/general.css
+++ b/core/http/static/general.css
@ -0,0 +1,93 @@
+body {
+    font-family: 'Inter', sans-serif;
+}
+.chat-container { height: 90vh; display: flex; flex-direction: column; }
+.chat-messages { overflow-y: auto; flex-grow: 1; }
+.htmx-indicator{
+        opacity:0;
+        transition: opacity 10ms ease-in;
+}
+.htmx-request .htmx-indicator{
+    opacity:1
+}
+/* Loader (https://cssloaders.github.io/) */
+.loader {
+  width: 12px;
+  height: 12px;
+  border-radius: 50%;
+  display: block;
+  margin:15px auto;
+  position: relative;
+  color: #FFF;
+  box-sizing: border-box;
+  animation: animloader 2s linear infinite;
+}
+
+@keyframes animloader {
+  0% { box-shadow: 14px 0 0 -2px,  38px 0 0 -2px,  -14px 0 0 -2px,  -38px 0 0 -2px; }
+  25% { box-shadow: 14px 0 0 -2px,  38px 0 0 -2px,  -14px 0 0 -2px,  -38px 0 0 2px; }
+  50% { box-shadow: 14px 0 0 -2px,  38px 0 0 -2px,  -14px 0 0 2px,  -38px 0 0 -2px; }
+  75% { box-shadow: 14px 0 0 2px,  38px 0 0 -2px,  -14px 0 0 -2px,  -38px 0 0 -2px; }
+  100% { box-shadow: 14px 0 0 -2px,  38px 0 0 2px,  -14px 0 0 -2px,  -38px 0 0 -2px; }
+}
+.progress {
+    height: 20px;
+    margin-bottom: 20px;
+    overflow: hidden;
+    background-color: #f5f5f5;
+    border-radius: 4px;
+    box-shadow: inset 0 1px 2px rgba(0,0,0,.1);
+}
+.progress-bar {
+    float: left;
+    width: 0%;
+    height: 100%;
+    font-size: 12px;
+    line-height: 20px;
+    color: #fff;
+    text-align: center;
+    background-color: #337ab7;
+    -webkit-box-shadow: inset 0 -1px 0 rgba(0,0,0,.15);
+    box-shadow: inset 0 -1px 0 rgba(0,0,0,.15);
+    -webkit-transition: width .6s ease;
+    -o-transition: width .6s ease;
+    transition: width .6s ease;
+}
+
+.user {
+    background-color: #007bff;
+}
+
+.assistant {
+    background-color: #28a745;
+}
+
+.message {
+    display: flex;
+    align-items: center;
+}
+
+.user, .assistant {
+    flex-grow: 1;
+    margin: 0.5rem;
+}
+
+ul {
+    list-style-type: disc; /* Adds bullet points */
+    padding-left: 1.25rem; /* Indents the list from the left margin */
+    margin-top: 1rem; /* Space above the list */
+}
+
+li {
+    font-size: 0.875rem; /* Small text size */
+    color: #4a5568; /* Dark gray text */
+    background-color: #f7fafc; /* Very light gray background */
+    border-radius: 0.375rem; /* Rounded corners */
+    padding: 0.5rem; /* Padding inside each list item */
+    box-shadow: 0 1px 3px 0 rgba(0, 0, 0, 0.1), 0 1px 2px 0 rgba(0, 0, 0, 0.06); /* Subtle shadow */
+    margin-bottom: 0.5rem; /* Vertical space between list items */
+}
+
+li:last-child {
+    margin-bottom: 0; /* Removes bottom margin from the last item */
+}
--- a/core/http/static/image.js
+++ b/core/http/static/image.js
@ -0,0 +1,96 @@
+/*
+
+https://github.com/david-haerer/chatapi
+
+MIT License
+
+Copyright (c) 2023 David Härer
+Copyright (c) 2024 Ettore Di Giacinto
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+*/
+function submitKey(event) {
+    event.preventDefault();
+    localStorage.setItem("key", document.getElementById("apiKey").value);
+    document.getElementById("apiKey").blur();
+  }
+  
+
+function genImage(event) {
+  event.preventDefault();
+  const input = document.getElementById("input").value;
+  const key = localStorage.getItem("key");
+
+  promptDallE(key, input);
+
+}
+  
+async function promptDallE(key, input) {
+  document.getElementById("loader").style.display = "block";
+  document.getElementById("input").value = "";
+  document.getElementById("input").disabled = true;
+
+  const model = document.getElementById("image-model").value;
+  const response = await fetch("/v1/images/generations", {
+    method: "POST",
+    headers: {
+      Authorization: `Bearer ${key}`,
+      "Content-Type": "application/json",
+    },
+    body: JSON.stringify({
+      model: model,
+      steps: 10,
+      prompt: input,
+      n: 1,
+      size: "512x512",
+    }),
+  });
+  const json = await response.json();
+  if (json.error) {
+    // Display error if there is one
+    var div = document.getElementById('result');  // Get the div by its ID
+    div.innerHTML = '<p style="color:red;">' + json.error.message + '</p>';
+    return;
+  }
+  const url = json.data[0].url;
+
+  var div = document.getElementById('result');  // Get the div by its ID
+  var img = document.createElement('img');         // Create a new img element
+  img.src = url;  // Set the source of the image
+  img.alt = 'Generated image';            // Set the alt text of the image
+
+  div.innerHTML = '';                             // Clear the existing content of the div
+  div.appendChild(img);                           // Add the new img element to the div
+
+  document.getElementById("loader").style.display = "none";
+  document.getElementById("input").disabled = false;
+  document.getElementById("input").focus();
+}
+
+document.getElementById("key").addEventListener("submit", submitKey);
+document.getElementById("input").focus();
+document.getElementById("genimage").addEventListener("submit", genImage);
+document.getElementById("loader").style.display = "none";
+
+const storeKey = localStorage.getItem("key");
+if (storeKey) {
+  document.getElementById("apiKey").value = storeKey;
+}
+
--- a/core/http/static/tts.js
+++ b/core/http/static/tts.js
@ -0,0 +1,64 @@
+function submitKey(event) {
+    event.preventDefault();
+    localStorage.setItem("key", document.getElementById("apiKey").value);
+    document.getElementById("apiKey").blur();
+  }
+  
+
+function genAudio(event) {
+  event.preventDefault();
+  const input = document.getElementById("input").value;
+  const key = localStorage.getItem("key");
+
+  tts(key, input);
+}
+  
+async function tts(key, input) {
+  document.getElementById("loader").style.display = "block";
+  document.getElementById("input").value = "";
+  document.getElementById("input").disabled = true;
+
+  const model = document.getElementById("tts-model").value;
+  const response = await fetch("/tts", {
+    method: "POST",
+    headers: {
+      Authorization: `Bearer ${key}`,
+      "Content-Type": "application/json",
+    },
+    body: JSON.stringify({
+      model: model,
+      input: input,
+    }),
+  });
+  if (!response.ok) {
+    const jsonData = await response.json(); // Now safely parse JSON
+    var div = document.getElementById('result');
+    div.innerHTML = '<p style="color:red;">Error: ' +jsonData.error.message + '</p>';
+    return;
+  }
+
+  var div = document.getElementById('result');  // Get the div by its ID
+  var link=document.createElement('a');
+  link.className = "m-2 float-right inline-block rounded bg-primary px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong";
+  link.innerHTML = "<i class='fa-solid fa-download'></i> Download result";
+  const blob = await response.blob();
+  link.href=window.URL.createObjectURL(blob);
+
+  div.innerHTML = '';                             // Clear the existing content of the div
+  div.appendChild(link);                           // Add the new img element to the div
+  console.log(link)
+  document.getElementById("loader").style.display = "none";
+  document.getElementById("input").disabled = false;
+  document.getElementById("input").focus();
+}
+
+document.getElementById("key").addEventListener("submit", submitKey);
+document.getElementById("input").focus();
+document.getElementById("tts").addEventListener("submit", genAudio);
+document.getElementById("loader").style.display = "none";
+
+const storeKey = localStorage.getItem("key");
+if (storeKey) {
+  document.getElementById("apiKey").value = storeKey;
+}
+
--- a/core/http/views/chat.html
+++ b/core/http/views/chat.html
@ -0,0 +1,228 @@
+<!--
+
+Part of this page is based on the OpenAI Chatbot example by David Härer:
+https://github.com/david-haerer/chatapi
+
+MIT License Copyright (c) 2023 David Härer
+            Copyright (c) 2024 Ettore Di Giacinto
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+-->
+<!doctype html>
+<html lang="en">
+  {{template "views/partials/head" .}}
+  <script defer src="/static/chat.js"></script>
+  <style>
+    body {
+        overflow: hidden; 
+    }
+  </style>
+  <body class="bg-gray-900 text-gray-200" x-data="{ key: $store.chat.key }">
+    <div class="flex flex-col min-h-screen">
+
+    {{template "views/partials/navbar"}}
+    <div class="chat-container mt-2 mr-2 ml-2 mb-2 bg-gray-800 shadow-lg rounded-lg" >
+     <!-- Chat Header -->
+    <div class="border-b border-gray-700 p-4"  x-data="{ component: 'menu' }">
+
+      <div class="flex items-center justify-between">
+
+      <h1 class="text-lg font-semibold"> <i class="fa-solid fa-comments"></i> Chat with {{.Model}} <a href="https://localai.io/features/text-generation/" target="_blank" >
+        <i class="fas fa-circle-info pr-2"></i>
+      </a></h1>
+      <div x-show="component === 'menu'" id="menu">
+        <button
+          @click="$store.chat.clear()"
+          id="clear"
+          title="Clear chat history"
+
+          data-twe-ripple-init
+          data-twe-ripple-color="light"
+          class="m-2 float-right inline-block rounded bg-primary px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong"
+          >
+          Clear chat 🔥
+        </button>
+        <button @click="component = 'key'" title="Update API key"
+        class="m-2 float-right inline-block rounded bg-primary px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong"
+        >Set API Key🔑</button>
+        <button @click="component = 'system_prompt'" title="System Prompt"
+        class="m-2 float-right inline-block rounded bg-primary px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong"
+        >Set system prompt</button>
+      </div>
+      <form x-show="component === 'key'" id="key">
+        <input
+          type="password"
+          id="apiKey"
+          name="apiKey"
+          class="bg-gray-800 text-white border border-gray-600 focus:border-blue-500 focus:ring focus:ring-blue-500 focus:ring-opacity-50 rounded-md shadow-sm p-2 appearance-none"
+          placeholder="OpenAI API Key"
+          x-model.lazy="key"
+        />
+        <button @click="component = 'menu'" type="submit" title="Save API key">
+          <i class="fa-solid fa-arrow-right"></i>
+        </button>
+      </form>
+      <form x-show="component === 'system_prompt'" id="system_prompt">
+        <textarea
+          type="text"
+          id="systemPrompt"
+          name="systemPrompt"
+          class="bg-gray-800 text-white border border-gray-600 focus:border-blue-500 focus:ring focus:ring-blue-500 focus:ring-opacity-50 rounded-md shadow-sm p-2 appearance-none"
+          placeholder="System prompt"
+          x-model.lazy="system_prompt"
+        ></textarea>
+        <button @click="component = 'menu'" type="submit" title="Save Prompt">
+          <i class="fa-solid fa-arrow-right"></i>
+        </button>
+      </form>
+
+      <select x-data="{ link : '' }" x-model="link" x-init="$watch('link', value => window.location = link)" 
+        class="bg-gray-800 text-white border border-gray-600 focus:border-blue-500 focus:ring focus:ring-blue-500 focus:ring-opacity-50 rounded-md shadow-sm p-2 appearance-none"
+        >	
+        <!-- Options -->
+        <option value="" disabled class="text-gray-400" >Select a model</option>
+        {{ $model:=.Model}}
+        {{ range .ModelsConfig }}
+        {{ if eq .Name $model }}
+        <option value="/chat/{{.Name}}" selected  class="bg-gray-700 text-white">{{.Name}}</option>
+        {{ else }}
+        <option value="/chat/{{.Name}}" class="bg-gray-700 text-white">{{.Name}}</option>
+        {{ end }}
+        {{ end }}
+      </select>
+
+      </div>
+    </div>
+
+    <div class="chat-messages p-4" id="chat" x-data="{history: $store.chat.history}">
+      <p id="usage" x-show="history.length === 0">
+        Start chatting with the AI by typing a prompt in the input field below.
+      </p>
+      <div id="messages">
+      <template x-for="message in history">
+        <div class="message flex items-start space-x-2 my-2" >
+          <!--<img :src="message.role === 'user' ? '/path/to/user-icon.png' : '/path/to/bot-icon.png'" alt="" class="h-6 w-6">-->
+          <i class="fa-solid h-8 w-8" :class="message.role === 'user' ? 'fa-user' : 'fa-robot'"  ></i>
+          <div class="flex flex-col flex-1">
+            <span class="text-xs font-semibold text-gray-600" x-text="message.role === 'user' ? 'User' : 'Assistant ({{.Model}})'"></span>
+            <template x-if="message.role === 'user'">
+              <div class="p-2 flex-1 rounded" :class="message.role" x-html="message.html"></div>
+            </template>
+            <template x-if="message.role === 'assistant'">
+              <div class="p-2 flex-1 rounded" :class="message.role" x-html="message.html"></div>
+            </template>
+            <template x-if="message.image">
+              <img :src="message.image" alt="Image" class="rounded-lg mt-2 h-36 w-36">
+            </template>
+          </div>
+        </div>
+      </template>
+      </div>
+    </div>
+
+    <div class="p-4 border-t border-gray-700" x-data="{ inputValue: '', shiftPressed: false, fileName: ''  }">
+      <div id="loader" class="my-2 loader" style="display: none;"></div>
+      <input id="chat-model" type="hidden" value="{{.Model}}">
+      <input id="input_image" type="file" style="display: none;" @change="fileName = $event.target.files[0].name">
+      <form id="prompt" action="/chat/{{.Model}}" method="get" @submit.prevent="submitPrompt">
+          <div class="relative w-full">
+              <textarea
+                  id="input"
+                  name="input"
+                  x-model="inputValue"
+                  placeholder="Send a message..."
+                  class="p-2 pl-2 border rounded w-full bg-gray-600 text-white placeholder-gray-300"
+                  required
+                  @keydown.shift="shiftPressed = true"
+                  @keyup.shift="shiftPressed = false"
+                  @keydown.enter="if (!shiftPressed) { submitPrompt($event); }"
+                  style="padding-right: 4rem;"
+              ></textarea>
+              <span x-text="fileName" id="fileName" class="absolute right-16 top-5 text-gray-300 text-sm mr-2"></span>
+              <button type="button" onclick="document.getElementById('input_image').click()" class="fa-solid fa-paperclip text-gray-300 ml-2 absolute right-10 top-3 text-lg p-2">
+              </button>
+              <button type=submit><i class="fa-solid fa-circle-up text-gray-300 absolute right-2 top-3 text-lg p-2"></i></button>
+          </div>
+      </form>
+  </div>
+    <script>
+      document.addEventListener("alpine:init", () => {
+        Alpine.store("chat", {
+          history: [],
+          languages: [undefined],
+          clear() {
+            this.history.length = 0;
+          },
+          add(role, content, image) {
+            const N = this.history.length - 1;
+            if (this.history.length && this.history[N].role === role) {
+              this.history[N].content += content;
+              str = this.history[N].content;
+              this.history[N].html = DOMPurify.sanitize(
+                marked.parse(this.history[N].content),
+              );
+            } else {
+              c =  ""
+              // split content newlines in content
+              const lines = content.split("\n");
+              // for each line, do DOMPurify.sanitize(marked.parse(line)) and add it to c
+              lines.forEach((line) => {
+                c += DOMPurify.sanitize(marked.parse(line));
+              });
+
+              this.history.push({
+                role: role,
+                content: content,
+                html: c,
+                image: image,
+              });
+            }
+
+            const parser = new DOMParser();
+            const html = parser.parseFromString(
+              this.history[this.history.length - 1].html,
+              "text/html",
+            );
+            const code = html.querySelectorAll("pre code");
+            if (!code.length) return;
+            code.forEach((el) => {
+              const language = el.className.split("language-")[1];
+              if (this.languages.includes(language)) return;
+              const script = document.createElement("script");
+              script.src = `https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.8.0/build/languages/${language}.min.js`;
+              document.head.appendChild(script);
+              this.languages.push(language);
+            });
+          },
+          messages() {
+            return this.history.map((message) => {
+              return {
+                role: message.role,
+                content: message.content,
+                image: message.image,
+              };
+            });
+          },
+        });
+      });
+    </script>
+    </div>
+  </body>
+</html>
--- a/core/http/views/index.html
+++ b/core/http/views/index.html
@ -10,23 +10,76 @@
    <div class="container mx-auto px-4 flex-grow">
        <div class="header text-center py-12">
            <h1 class="text-5xl font-bold text-gray-100">Welcome to <i>your</i> LocalAI instance!</h1>
-            <div class="mt-6">
-                <!-- Logo can be uncommented and updated with a valid URL -->
-            </div>
            <p class="mt-4 text-lg">The FOSS alternative to OpenAI, Claude, ...</p>
            <a href="https://localai.io" target="_blank" class="mt-4 inline-block bg-blue-500 text-white py-2 px-4 rounded-lg shadow transition duration-300 ease-in-out hover:bg-blue-700 hover:shadow-lg">
                <i class="fas fa-book-reader pr-2"></i>Documentation
-            </a>
+            </a>    
        </div>

-        <div class="models mt-12">
+        <div class="models mt-4">
+
+            <!-- Show in progress operations-->
+            {{ if .ProcessingModels }}
+            <h3 
+                class="mt-4 mb-4 text-center text-3xl font-semibold text-gray-100">Operations in progress</h2>          
+            {{end}}
+            {{$taskType:=.TaskTypes}}
+            {{ range $key,$value:=.ProcessingModels }} 
+                {{ $op := index $taskType $key}}
+                {{$parts := split "@" $key}}
+                 <div class="flex items-center justify-between bg-slate-600 p-2 mb-2 rounded-md">
+                    <div class="flex items center">
+                        <span class="text-gray-300"><a href="/browse?term={{$parts._1}}"
+                            class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                            >{{$parts._1}}</a> (from the '{{$parts._0}}' repository)</span>
+                    </div>
+                    <div hx-get="/browse/job/{{$value}}" hx-swap="innerHTML" hx-target="this" hx-trigger="done">
+                        <h3 role="status" id="pblabel" >{{$op}}
+                            <div hx-get="/browse/job/progress/{{$value}}" hx-trigger="every 600ms" hx-target="this"
+                            hx-swap=  "innerHTML"  ></div></h3>
+                    </div>     
+                </div>    
+            {{ end }}
+            <!-- END Show in progress operations-->
+    
+            {{ if eq (len .ModelsConfig) 0 }}
+            <h2 class="text-center text-3xl font-semibold text-gray-100"> <i class="text-yellow-200 ml-2 fa-solid fa-triangle-exclamation animate-pulse"></i> Ouch! seems you don't have any models installed!</h2>
+            <p class="text-center mt-4 text-xl">..install something from the <a class="text-gray-400 hover:text-white ml-1 px-3 py-2 rounded" href="/browse">🖼️ Gallery</a> or check the <a href="https://localai.io/basics/getting_started/" class="text-gray-400 hover:text-white ml-1 px-3 py-2 rounded"> <i class="fa-solid fa-book"></i> Getting started documentation </a></p>
+            {{ else }}
            <h2 class="text-center text-3xl font-semibold text-gray-100">Installed models</h2>
            <p class="text-center mt-4 text-xl">We have {{len .ModelsConfig}} pre-loaded models available.</p>
-            <ul class="mt-8 space-y-4">
+            <table class="table-auto mt-4 w-full text-left text-gray-200">
+                <thead class="text-xs text-gray-400 uppercase bg-gray-700">
+                    <tr>
+                        <th class="px-4 py-2"></th>
+                        <th class="px-4 py-2">Model Name</th>
+                        <th class="px-4 py-2">Backend</th>
+                        <th class="px-4 py-2 float-right">Actions</th>
+                    </tr>
+                </thead>
+                <tbody>
+                {{$galleryConfig:=.GalleryConfig}}
+                {{$noicon:="https://upload.wikimedia.org/wikipedia/commons/6/65/No-Image-Placeholder.svg"}}
                {{ range .ModelsConfig }}
-                <li class="bg-gray-800 border border-gray-700 p-4 rounded-lg">
-                    <div class="flex justify-between items-center">
-                        <p class="font-bold text-white flex items-center"><i class="fas fa-brain pr-2"></i>{{.Name}}</p>
+                {{ $cfg:= index $galleryConfig .Name}}
+                <tr class="bg-gray-800 border-b border-gray-700">
+                    <td class="px-4 py-3">
+                        {{ with $cfg }}
+                        <img {{ if $cfg.Icon }}
+                            src="{{$cfg.Icon}}"
+                            {{ else }}
+                            src="{{$noicon}}"
+                            {{ end }}
+                            class="rounded-t-lg max-h-24 max-w-24 object-cover mt-3"
+                            >
+                        {{ else}}
+                            <img src="{{$noicon}}" class="rounded-t-lg max-h-24 max-w-24 object-cover mt-3">
+                        {{ end }}
+                    </td>
+                    <td class="px-4 py-3 font-bold">
+                        <p class="font-bold text-white flex items-center"><i class="fas fa-brain pr-2"></i><a href="/browse?term={{.Name}}">{{.Name}}</a></p>
+                    </td>
+                    <td class="px-4 py-3 font-bold">
                        {{ if .Backend }}
                        <!-- Badge for Backend -->
                        <span class="inline-block bg-blue-500 text-white py-1 px-3 rounded-full text-xs">
@ -37,11 +90,20 @@
                            auto
                        </span>
                        {{ end }}
-                    </div>
-                    <!-- Additional details can go here -->
-                </li>
+                    </td>
+
+                    <td class="px-4 py-3">
+                        <button 
+                            class="float-right inline-block rounded bg-red-800 px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-red-accent-300 hover:shadow-red-2 focus:bg-red-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-red-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong" 
+                            data-twe-ripple-color="light" data-twe-ripple-init="" hx-confirm="Are you sure you wish to delete the model?" hx-post="/browse/delete/model/{{.Name}}" hx-swap="outerHTML"><i class="fa-solid fa-cancel pr-2"></i>Delete</button>
+                    </td>
                {{ end }}
-            </ul>
+                </tbody>
+            </table>
+            {{ end }}
+
+
+
        </div>
    </div>

--- a/core/http/views/models.html
+++ b/core/http/views/models.html
@ -13,10 +13,83 @@
                🖼️ Available models from <i>{{ len .Repositories }}</i> repositories     <a href="https://localai.io/models/" target="_blank" >
                    <i class="fas fa-circle-info pr-2"></i>
                </a></h2> 
+
+            <div class="text-center font-semibold text-gray-100">
+                <h2>Filter by type:</h2>
+                <button  hx-post="/browse/search/models"
+                    class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                    hx-target="#search-results" 
+                    hx-vals='{"search": "tts"}'
+                hx-indicator=".htmx-indicator" >TTS</button> 
+                <button  hx-post="/browse/search/models" 
+                    class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                    hx-target="#search-results" 
+                    hx-vals='{"search": "stablediffusion"}'
+                hx-indicator=".htmx-indicator" >Image generation</button> 
+                <button  hx-post="/browse/search/models" \
+                    class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                    hx-target="#search-results" 
+                    hx-vals='{"search": "llm"}'
+                hx-indicator=".htmx-indicator" >Text generation</button> 
+                <button  hx-post="/browse/search/models" 
+                    class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                    hx-target="#search-results" 
+                    hx-vals='{"search": "multimodal"}'
+                hx-indicator=".htmx-indicator" >Multimodal</button> 
+                <button  hx-post="/browse/search/models" 
+                    class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                    hx-target="#search-results" 
+                    hx-vals='{"search": "embedding"}'
+                hx-indicator=".htmx-indicator" >Embeddings</button>
+                <button  hx-post="/browse/search/models"
+                    class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                    hx-target="#search-results" 
+                    hx-vals='{"search": "rerank"}'
+                hx-indicator=".htmx-indicator" >Rerankers</button> 
+                <button  
+                    hx-post="/browse/search/models"
+                    class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                    hx-target="#search-results" 
+                    hx-vals='{"search": "whisper"}'
+                hx-indicator=".htmx-indicator" >Audio transcription</button> 
+            </div>
+
+            <div class="text-center text-xs font-semibold text-gray-100">
+                Filter by tags:
+                {{ range .AllTags }}
+                    <button  hx-post="/browse/search/models" class="text-blue-500" hx-target="#search-results" 
+                    hx-vals='{"search": "{{.}}"}'
+                    hx-indicator=".htmx-indicator" >{{.}}</button> 
+                {{ end }}
+            </div>
+
            
-        
            <span class="htmx-indicator loader"></span>
-            <input class="form-control appearance-none block w-full px-3 py-2 text-base font-normal text-gray-300 pb-2 mb-5 bg-gray-800 bg-clip-padding border border-solid border-gray-600 rounded transition ease-in-out m-0 focus:text-gray-300 focus:bg-gray-900 focus:border-blue-500 focus:outline-none" type="search" 
+            <!-- Show in progress operations-->
+            {{ if .ProcessingModels }}
+            <h2 
+                class="mt-4 mb-4 text-center text-3xl font-semibold text-gray-100">Operations in progress</h2>          
+            {{end}}
+            {{$taskType:=.TaskTypes}}
+            {{ range $key,$value:=.ProcessingModels }} 
+                {{ $op := index $taskType $key}}
+                {{$parts := split "@" $key}}
+                 <div class="flex items-center justify-between bg-slate-600 p-2 mb-2 rounded-md">
+                    <div class="flex items center">
+                        <span class="text-gray-300"><a href="/browse?term={{$parts._1}}"
+                            class="text-white-500 inline-block bg-blue-200 rounded-full px-3 py-1 text-sm font-semibold text-gray-700 mr-2 mb-2 hover:bg-gray-300 hover:shadow-gray-2"
+                            >{{$parts._1}}</a> (from the '{{$parts._0}}' repository)</span>
+                    </div>
+                    <div hx-get="/browse/job/{{$value}}" hx-swap="innerHTML" hx-target="this" hx-trigger="done">
+                        <h3 role="status" id="pblabel" >{{$op}}
+                            <div hx-get="/browse/job/progress/{{$value}}" hx-trigger="every 600ms" hx-target="this"
+                            hx-swap=  "innerHTML"  ></div></h3>
+                    </div>     
+                </div>                   
+            {{ end }}
+            <!-- END Show in progress operations-->
+
+            <input class="form-control appearance-none block w-full mt-5 px-3 py-2 text-base font-normal text-gray-300 pb-2 mb-5 bg-gray-800 bg-clip-padding border border-solid border-gray-600 rounded transition ease-in-out m-0 focus:text-gray-300 focus:bg-gray-900 focus:border-blue-500 focus:outline-none" type="search" 
                name="search" placeholder="Begin Typing To Search models..." 
                hx-post="/browse/search/models" 
                hx-trigger="input changed delay:500ms, search" 
--- a/core/http/views/partials/head.html
+++ b/core/http/views/partials/head.html
@ -2,6 +2,28 @@
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{{.Title}}</title>
+    <link
+    rel="stylesheet"
+    href="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.8.0/build/styles/default.min.css"
+  />
+    <script
+    defer
+    src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.8.0/build/highlight.min.js"
+  ></script>
+    <script
+    defer
+    src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"
+  ></script>
+  <script
+    defer
+    src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"
+  ></script>
+  <script
+    defer
+    src="https://cdn.jsdelivr.net/npm/dompurify@3.0.6/dist/purify.min.js"
+  ></script>
+
+  <link href="/static/general.css" rel="stylesheet" />
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&family=Roboto:wght@400;500&display=swap" rel="stylesheet">
    <link
    href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700,900&display=swap"
@ -27,52 +49,4 @@
  </script>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.1.1/css/all.min.css">
    <script src="https://unpkg.com/htmx.org@1.9.12" integrity="sha384-ujb1lZYygJmzgSwoxRggbCHcjc0rB2XoQrxeTUQyRjrOnlCoYta87iKBWq3EsdM2" crossorigin="anonymous"></script>
-    <style>
-        body {
-            font-family: 'Inter', sans-serif;
-        }
-        /* Loader (https://cssloaders.github.io/) */
-        .loader {
-          width: 12px;
-          height: 12px;
-          border-radius: 50%;
-          display: block;
-          margin:15px auto;
-          position: relative;
-          color: #FFF;
-          box-sizing: border-box;
-          animation: animloader 2s linear infinite;
-        }
-
-        @keyframes animloader {
-          0% { box-shadow: 14px 0 0 -2px,  38px 0 0 -2px,  -14px 0 0 -2px,  -38px 0 0 -2px; }
-          25% { box-shadow: 14px 0 0 -2px,  38px 0 0 -2px,  -14px 0 0 -2px,  -38px 0 0 2px; }
-          50% { box-shadow: 14px 0 0 -2px,  38px 0 0 -2px,  -14px 0 0 2px,  -38px 0 0 -2px; }
-          75% { box-shadow: 14px 0 0 2px,  38px 0 0 -2px,  -14px 0 0 -2px,  -38px 0 0 -2px; }
-          100% { box-shadow: 14px 0 0 -2px,  38px 0 0 2px,  -14px 0 0 -2px,  -38px 0 0 -2px; }
-        }
-        .progress {
-            height: 20px;
-            margin-bottom: 20px;
-            overflow: hidden;
-            background-color: #f5f5f5;
-            border-radius: 4px;
-            box-shadow: inset 0 1px 2px rgba(0,0,0,.1);
-        }
-        .progress-bar {
-            float: left;
-            width: 0%;
-            height: 100%;
-            font-size: 12px;
-            line-height: 20px;
-            color: #fff;
-            text-align: center;
-            background-color: #337ab7;
-            -webkit-box-shadow: inset 0 -1px 0 rgba(0,0,0,.15);
-            box-shadow: inset 0 -1px 0 rgba(0,0,0,.15);
-            -webkit-transition: width .6s ease;
-            -o-transition: width .6s ease;
-            transition: width .6s ease;
-        }
-    </style>
 </head>
--- a/core/http/views/partials/navbar.html
+++ b/core/http/views/partials/navbar.html
@ -6,12 +6,42 @@
                <a href="/" class="text-white text-xl font-bold"><img src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd" alt="LocalAI Logo" class="h-10 mr-3 border-2 border-gray-300 shadow rounded"></a>
                <a href="/" class="text-white text-xl font-bold">LocalAI</a>
            </div>
-            <div>
+            <!-- Menu button for small screens -->
+            <div class="lg:hidden">
+                <button id="menu-toggle" class="text-gray-400 hover:text-white focus:outline-none">
+                    <i class="fas fa-bars fa-lg"></i>
+                </button>
+            </div>
+            <!-- Navigation links -->
+            <div class="hidden lg:flex lg:items-center lg:justify-end lg:flex-1 lg:w-0">
                <a href="/" class="text-gray-400 hover:text-white px-3 py-2 rounded"><i class="fas fa-home pr-2"></i>Home</a>
                <a href="https://localai.io" class="text-gray-400 hover:text-white px-3 py-2 rounded" target="_blank" ><i class="fas fa-book-reader pr-2"></i> Documentation</a>
                <a href="/browse/" class="text-gray-400 hover:text-white px-3 py-2 rounded"><i class="fas fa-brain pr-2"></i> Models</a>
+                <a href="/chat/" class="text-gray-400 hover:text-white px-3 py-2 rounded"><i class="fa-solid fa-comments pr-2"></i> Chat</a>
+                <a href="/text2image/" class="text-gray-400 hover:text-white px-3 py-2 rounded"><i class="fas fa-image pr-2"></i> Generate images</a>
+                <a href="/tts/" class="text-gray-400 hover:text-white px-3 py-2 rounded"><i class="fa-solid fa-music pr-2"></i> TTS </a>
                <a href="/swagger/" class="text-gray-400 hover:text-white px-3 py-2 rounded"><i class="fas fa-code pr-2"></i> API</a>
            </div>
        </div>
+        <!-- Collapsible menu for small screens -->
+        <div class="hidden lg:hidden" id="mobile-menu">
+            <div class="pt-4 pb-3 border-t border-gray-700">
+                <a href="/" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1"><i class="fas fa-home pr-2"></i>Home</a>
+                <a href="https://localai.io" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1" target="_blank" ><i class="fas fa-book-reader pr-2"></i> Documentation</a>
+                <a href="/browse/" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1"><i class="fas fa-brain pr-2"></i> Models</a>
+                <a href="/chat/" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1"><i class="fa-solid fa-comments pr-2"></i> Chat</a>
+                <a href="/text2image/" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1"><i class="fas fa-image pr-2"></i> Generate images</a>
+                <a href="/tts/" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1"><i class="fa-solid fa-music pr-2"></i> TTS </a>
+                <a href="/swagger/" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1"><i class="fas fa-code pr-2"></i> API</a>
+            </div>
+        </div>
    </div>
-</nav>
+</nav>
+
+<script>
+    // JavaScript to toggle the mobile menu
+    document.getElementById('menu-toggle').addEventListener('click', function () {
+        var mobileMenu = document.getElementById('mobile-menu');
+        mobileMenu.classList.toggle('hidden');
+    });
+</script>
--- a/core/http/views/text2image.html
+++ b/core/http/views/text2image.html
@ -0,0 +1,89 @@
+<!DOCTYPE html>
+<html lang="en">
+{{template "views/partials/head" .}}
+<script defer src="/static/image.js"></script>
+
+<body class="bg-gray-900 text-gray-200">
+<div class="flex flex-col min-h-screen">
+   
+    {{template "views/partials/navbar" .}}
+    <div class="container mx-auto px-4 flex-grow " x-data="{ component: 'menu' }">
+    
+
+        <div class="mt-12">
+          <div class="flex items-center justify-center text-center pb-2">
+            <span class="text-3xl font-semibold text-gray-100">
+              🖼️ Text to Image
+            <a href="https://localai.io/features/image-generation" target="_blank" >
+              <i class="fas fa-circle-info pr-2"></i>
+            </a>
+            </span>
+    
+          </div>
+
+            <div class="text-center font-semibold text-gray-100">
+              <div class="flex items-center justify-between">
+
+              <div x-show="component === 'menu'" id="menu">
+                <button @click="component = 'key'" title="Update API key"
+                class="m-2 float-right inline-block rounded bg-primary px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong"
+                >Set API Key🔑</button>
+              </div>
+              <form x-show="component === 'key'" id="key">
+                <input
+                  type="password"
+                  id="apiKey"
+                  name="apiKey"
+                  placeholder="OpenAI API Key"
+                  x-model.lazy="key"
+                />
+                <button @click="component = 'menu'" type="submit" title="Save API key">
+                  🔒
+                </button>
+              </form>
+
+              <select x-data="{ link : '' }" x-model="link" x-init="$watch('link', value => window.location = link)" 
+                class="bg-gray-800 text-white border border-gray-600 focus:border-blue-500 focus:ring focus:ring-blue-500 focus:ring-opacity-50 rounded-md shadow-sm p-2 appearance-none"
+                >	
+                <!-- Options -->
+                <option value="" disabled class="text-gray-400" >Select a model</option>
+                {{ $model:=.Model}}
+                {{ range .ModelsConfig }}
+                {{ if eq .Name $model }}
+                <option value="/text2image/{{.Name}}" selected class="bg-gray-700 text-white">{{.Name}}</option>
+                {{ else }}
+                <option value="/text2image/{{.Name}}" class="bg-gray-700 text-white">{{.Name}}</option>
+                {{ end }}
+                {{ end }}
+              </select>
+              
+              </div>
+            </div>
+
+            <div class="mt-12">
+              <input id="image-model" type="hidden" value="{{.Model}}">
+              <form id="genimage" action="/text2image/{{.Model}}" method="get">
+                <input
+                  type="text"
+                  id="input"
+                  name="input"
+                  placeholder="Prompt…"
+                  autocomplete="off"
+                  class="p-2 border rounded w-full bg-gray-600 text-white placeholder-gray-300"
+                  required
+                />
+              </form>
+              <div class="container max-w-screen-lg mx-auto mt-4 pb-10 flex justify-center">
+                <div id="loader" class="my-2 loader"  ></div>
+              </div>
+              <div class="container max-w-screen-lg mx-auto mt-4 pb-10 flex justify-center">
+                <div id="result" class="mx-auto"></div>
+              </div>
+            </div>
+        </div>
+    </div>
+
+    {{template "views/partials/footer" .}}
+</div>
+</body>
+</html>
--- a/core/http/views/tts.html
+++ b/core/http/views/tts.html
@ -0,0 +1,86 @@
+<!DOCTYPE html>
+<html lang="en">
+{{template "views/partials/head" .}}
+<script defer src="/static/tts.js"></script>
+
+<body class="bg-gray-900 text-gray-200">
+<div class="flex flex-col min-h-screen">
+   
+    {{template "views/partials/navbar" .}}
+    <div class="container mx-auto px-4 flex-grow " x-data="{ component: 'menu' }">
+          <div class="mt-12">
+            <div class="flex items-center justify-center text-center pb-2">
+              <span class="text-3xl font-semibold text-gray-100">
+                <i class="fa-solid fa-music"></i> Text to speech/audio
+              <a href="https://localai.io/features/text-to-audio/" target="_blank" >
+                <i class="fas fa-circle-info pr-2"></i>
+              </a>
+              </span>
+      
+            </div>
+            <div class="text-center font-semibold text-gray-100">
+              <div class="flex items-center justify-between">
+
+              <div x-show="component === 'menu'" id="menu">
+                <button @click="component = 'key'" title="Update API key"
+                class="m-2 float-right inline-block rounded bg-primary px-6 pb-2.5 mb-3 pt-2.5 text-xs font-medium uppercase leading-normal text-white shadow-primary-3 transition duration-150 ease-in-out hover:bg-primary-accent-300 hover:shadow-primary-2 focus:bg-primary-accent-300 focus:shadow-primary-2 focus:outline-none focus:ring-0 active:bg-primary-600 active:shadow-primary-2 dark:shadow-black/30 dark:hover:shadow-dark-strong dark:focus:shadow-dark-strong dark:active:shadow-dark-strong"
+                >Set API Key🔑</button>
+              </div>
+              <form x-show="component === 'key'" id="key">
+                <input
+                  type="password"
+                  id="apiKey"
+                  name="apiKey"
+                  placeholder="OpenAI API Key"
+                  x-model.lazy="key"
+                />
+                <button @click="component = 'menu'" type="submit" title="Save API key">
+                  🔒
+                </button>
+              </form>
+
+              <select x-data="{ link : '' }" x-model="link" x-init="$watch('link', value => window.location = link)" 
+                class="bg-gray-800 text-white border border-gray-600 focus:border-blue-500 focus:ring focus:ring-blue-500 focus:ring-opacity-50 rounded-md shadow-sm p-2 appearance-none"
+                >	
+                <!-- Options -->
+                <option value="" disabled class="text-gray-400" >Select a model</option>
+                {{ $model:=.Model}}
+                {{ range .ModelsConfig }}
+                {{ if eq .Name $model }}
+                <option value="/tts/{{.Name}}" selected class="bg-gray-700 text-white">{{.Name}}</option>
+                {{ else }}
+                <option value="/tts/{{.Name}}" class="bg-gray-700 text-white">{{.Name}}</option>
+                {{ end }}
+                {{ end }}
+              </select>
+              
+              </div>
+            </div>
+
+            <div class="mt-12">
+              <input id="tts-model" type="hidden" value="{{.Model}}">
+              <form id="tts" action="/tts/{{.Model}}" method="get">
+                <input
+                  type="text"
+                  id="input"
+                  name="input"
+                  placeholder="Prompt…"
+                  autocomplete="off"
+                  class="p-2 border rounded w-full bg-gray-600 text-white placeholder-gray-300"
+                  required
+                />
+              </form>
+              <div class="container max-w-screen-lg mx-auto mt-4 pb-10 flex justify-center">
+                <div id="loader" class="my-2 loader"  ></div>
+              </div>
+              <div class="container max-w-screen-lg mx-auto mt-4 pb-10 flex justify-center">
+                <div id="result" class="mx-auto"></div>
+              </div>
+            </div>
+        </div>
+    </div>
+
+    {{template "views/partials/footer" .}}
+</div>
+</body>
+</html>
--- a/core/services/gallery.go
+++ b/core/services/gallery.go
@ -90,7 +90,7 @@ func (g *GalleryService) Start(c context.Context, cl *config.BackendConfigLoader
 				if op.Delete {
 					modelConfig := &config.BackendConfig{}
 					// Galleryname is the name of the model in this case
-					dat, err := os.ReadFile(filepath.Join(g.modelPath, op.GalleryName+".yaml"))
+					dat, err := os.ReadFile(filepath.Join(g.modelPath, op.GalleryModelName+".yaml"))
 					if err != nil {
 						updateError(err)
 						continue
@ -111,14 +111,14 @@ func (g *GalleryService) Start(c context.Context, cl *config.BackendConfigLoader
 						files = append(files, modelConfig.MMProjFileName())
 					}

-					err = gallery.DeleteModelFromSystem(g.modelPath, op.GalleryName, files)
+					err = gallery.DeleteModelFromSystem(g.modelPath, op.GalleryModelName, files)
 				} else {
 					// if the request contains a gallery name, we apply the gallery from the gallery list
-					if op.GalleryName != "" {
-						if strings.Contains(op.GalleryName, "@") {
-							err = gallery.InstallModelFromGallery(op.Galleries, op.GalleryName, g.modelPath, op.Req, progressCallback)
+					if op.GalleryModelName != "" {
+						if strings.Contains(op.GalleryModelName, "@") {
+							err = gallery.InstallModelFromGallery(op.Galleries, op.GalleryModelName, g.modelPath, op.Req, progressCallback)
 						} else {
-							err = gallery.InstallModelFromGalleryByName(op.Galleries, op.GalleryName, g.modelPath, op.Req, progressCallback)
+							err = gallery.InstallModelFromGalleryByName(op.Galleries, op.GalleryModelName, g.modelPath, op.Req, progressCallback)
 						}
 					} else if op.ConfigURL != "" {
 						startup.PreloadModelsConfigurations(op.ConfigURL, g.modelPath, op.ConfigURL)
@ -148,10 +148,11 @@ func (g *GalleryService) Start(c context.Context, cl *config.BackendConfigLoader

 				g.UpdateStatus(op.Id,
 					&gallery.GalleryOpStatus{
-						Deletion:  op.Delete,
-						Processed: true,
-						Message:   "completed",
-						Progress:  100})
+						Deletion:         op.Delete,
+						Processed:        true,
+						GalleryModelName: op.GalleryModelName,
+						Message:          "completed",
+						Progress:         100})
 			}
 		}
 	}()
--- a/core/startup/startup.go
+++ b/core/startup/startup.go
@ -11,6 +11,7 @@ import (
 	"github.com/go-skynet/LocalAI/pkg/assets"
 	"github.com/go-skynet/LocalAI/pkg/model"
 	pkgStartup "github.com/go-skynet/LocalAI/pkg/startup"
+	"github.com/go-skynet/LocalAI/pkg/xsysinfo"
 	"github.com/rs/zerolog/log"
 )

@ -19,12 +20,23 @@ func Startup(opts ...config.AppOption) (*config.BackendConfigLoader, *model.Mode

 	log.Info().Msgf("Starting LocalAI using %d threads, with models path: %s", options.Threads, options.ModelPath)
 	log.Info().Msgf("LocalAI version: %s", internal.PrintableVersion())
+	caps, err := xsysinfo.CPUCapabilities()
+	if err == nil {
+		log.Debug().Msgf("CPU capabilities: %v", caps)
+	}
+	gpus, err := xsysinfo.GPUs()
+	if err == nil {
+		log.Debug().Msgf("GPU count: %d", len(gpus))
+		for _, gpu := range gpus {
+			log.Debug().Msgf("GPU: %s", gpu.String())
+		}
+	}

 	// Make sure directories exists
 	if options.ModelPath == "" {
 		return nil, nil, nil, fmt.Errorf("options.ModelPath cannot be empty")
 	}
-	err := os.MkdirAll(options.ModelPath, 0750)
+	err = os.MkdirAll(options.ModelPath, 0750)
 	if err != nil {
 		return nil, nil, nil, fmt.Errorf("unable to create ModelPath: %q", err)
 	}
--- a/docs/content/docs/features/text-generation.md
+++ b/docs/content/docs/features/text-generation.md
@ -296,7 +296,7 @@ backend: transformers
 parameters:
    model: "facebook/opt-125m"
 type: AutoModelForCausalLM
-quantization: bnb_4bit # One of: bnb_8bit, bnb_4bit, xpu_4bit (optional)
+quantization: bnb_4bit # One of: bnb_8bit, bnb_4bit, xpu_4bit, xpu_8bit (optional)
 ```

 The backend will automatically download the required files in order to run the model.
@ -307,10 +307,42 @@ The backend will automatically download the required files in order to run the m

 | Type | Description |
 | --- | --- |
-| `AutoModelForCausalLM` | `AutoModelForCausalLM` is a model that can be used to generate sequences. |
-| `OVModelForCausalLM` | for OpenVINO models |
+| `AutoModelForCausalLM` | `AutoModelForCausalLM` is a model that can be used to generate sequences. Use it for NVIDIA CUDA and Intel GPU with Intel Extensions for Pytorch acceleration |
+| `OVModelForCausalLM` | for Intel CPU/GPU/NPU OpenVINO Text Generation models |
+| `OVModelForFeatureExtraction` | for Intel CPU/GPU/NPU OpenVINO Embedding acceleration |
 | N/A | Defaults to `AutoModel` |

+- `OVModelForCausalLM` requires OpenVINO IR [Text Generation](https://huggingface.co/models?library=openvino&pipeline_tag=text-generation) models from Hugging face
+- `OVModelForFeatureExtraction` works with any Safetensors Transformer [Feature Extraction](https://huggingface.co/models?pipeline_tag=feature-extraction&library=transformers,safetensors) model from Huggingface (Embedding Model)
+
+Please note that streaming is currently not implemente in `AutoModelForCausalLM` for Intel GPU.
+AMD GPU support is not implemented.
+Although AMD CPU is not officially supported by OpenVINO there are reports that it works: YMMV.
+
+##### Embeddings
+Use `embeddings: true` if the model is an embedding model
+
+##### Inference device selection
+Transformer backend tries to automatically select the best device for inference, anyway you can override the decision manually overriding with the `main_gpu` parameter.
+
+| Inference Engine | Applicable Values |
+| --- | --- |
+| CUDA | `cuda`, `cuda.X` where X is the GPU device like in `nvidia-smi -L` output |
+| OpenVINO | Any applicable value from [Inference Modes](https://docs.openvino.ai/2024/openvino-workflow/running-inference/inference-devices-and-modes.html) like `AUTO`,`CPU`,`GPU`,`NPU`,`MULTI`,`HETERO` |
+
+Example for CUDA:
+`main_gpu: cuda.0`
+
+Example for OpenVINO:
+`main_gpu: AUTO:-CPU`
+
+This parameter applies to both Text Generation and Feature Extraction (i.e. Embeddings) models.
+
+##### Inference Precision
+Transformer backend automatically select the fastest applicable inference precision according to the device support.
+CUDA backend can manually enable *bfloat16* if your hardware support it with the following parameter:
+
+`f16: true`

 ##### Quantization

@ -318,8 +350,42 @@ The backend will automatically download the required files in order to run the m
 | --- | --- |
 | `bnb_8bit` | 8-bit quantization |
 | `bnb_4bit` | 4-bit quantization |
+| `xpu_8bit` | 8-bit quantization for Intel XPUs |
 | `xpu_4bit` | 4-bit quantization for Intel XPUs |

+##### Trust Remote Code
+Some models like Microsoft Phi-3 requires external code than what is provided by the transformer library.
+By default it is disabled for security.
+It can be manually enabled with:
+`trust_remote_code: true`
+
+##### Maximum Context Size
+Maximum context size in bytes can be specified with the parameter: `context_size`. Do not use values higher than what your model support.
+
+Usage example:
+`context_size: 8192`
+
+##### Auto Prompt Template
+Usually chat template is defined by the model author in the `tokenizer_config.json` file.
+To enable it use the `use_tokenizer_template: true` parameter in the `template` section.
+
+Usage example:
+```
+template:
+  use_tokenizer_template: true
+```
+
+##### Custom Stop Words
+Stopwords are usually defined in `tokenizer_config.json` file.
+They can be overridden with the `stopwords` parameter in case of need like in llama3-Instruct model.
+
+Usage example:
+```
+stopwords:
+- "<|eot_id|>"
+- "<|end_of_text|>"
+```
+
 #### Usage

 Use the `completions` endpoint by specifying the `transformers` model:
--- a/docs/content/docs/getting-started/build.md
+++ b/docs/content/docs/getting-started/build.md
@ -144,7 +144,7 @@ Install `xcode` from the Apps Store (needed for metalkit)

 ```
 # install build dependencies
-brew install abseil cmake go grpc protobuf wget
+brew install abseil cmake go grpc protobuf wget protoc-gen-go protoc-gen-go-grpc

 # clone the repo
 git clone https://github.com/go-skynet/LocalAI.git
--- a/docs/content/docs/reference/compatibility-table.md
+++ b/docs/content/docs/reference/compatibility-table.md
@ -45,10 +45,11 @@ LocalAI will attempt to automatically load models which are not explicitly confi
 | [tinydream](https://github.com/symisc/tiny-dream#tiny-dreaman-embedded-header-only-stable-diffusion-inference-c-librarypixlabiotiny-dream)         | stablediffusion               | no                       | Image                 | no                                | no                   | N/A |
 | `coqui` | Coqui    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
 | `petals` | Various GPTs and quantization formats | yes                      | GPT             | no | no                  | CPU/CUDA |
-| `transformers` | Various GPTs and quantization formats | yes                      | GPT, embeddings            | yes | no                  | CPU/CUDA |
+| `transformers` | Various GPTs and quantization formats | yes                      | GPT, embeddings            | yes | yes****                  | CPU/CUDA/XPU |

 Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "docs/advanced" %}})).

 - \* 7b ONLY
 - ** doesn't seem to be accurate
- *** 7b and 40b with the `ggccv` format, for instance: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML
+- *** 7b and 40b with the `ggccv` format, for instance: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML
+- **** Only for CUDA and OpenVINO CPU/XPU acceleration.
--- a/docs/data/version.json
+++ b/docs/data/version.json
@ -1,3 +1,3 @@
 {
-  "version": "v2.13.0"
+  "version": "v2.14.0"
 }
--- a/examples/langchain/langchainpy-localai-example/requirements.txt
+++ b/examples/langchain/langchainpy-localai-example/requirements.txt
@ -25,7 +25,7 @@ PyYAML==6.0
 requests==2.31.0
 SQLAlchemy==2.0.12
 tenacity==8.2.2
-tqdm==4.65.0
+tqdm==4.66.3
 typing-inspect==0.8.0
 typing_extensions==4.5.0
 urllib3==1.26.18
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@ -96,6 +96,22 @@
    - filename: Meta-Llama-3-8B-Instruct.Q6_K.gguf
      sha256: b7bad45618e2a76cc1e89a0fbb93a2cac9bf410e27a619c8024ed6db53aa9b4a
      uri: huggingface://QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct.Q6_K.gguf
+- <<: *llama3
+  name: "llama-3-8b-instruct-coder"
+  icon: https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/0O4cIuv3wNbY68-FP7tak.jpeg
+  urls:
+    - https://huggingface.co/bartowski/Llama-3-8B-Instruct-Coder-GGUF
+    - https://huggingface.co/rombodawg/Llama-3-8B-Instruct-Coder
+  description: |
+    Original model: https://huggingface.co/rombodawg/Llama-3-8B-Instruct-Coder
+    All quants made using imatrix option with dataset provided by Kalomaze here
+  overrides:
+    parameters:
+      model: Llama-3-8B-Instruct-Coder-Q4_K_M.gguf
+  files:
+    - filename: Llama-3-8B-Instruct-Coder-Q4_K_M.gguf
+      sha256: 639ab8e3aeb7aa82cff6d8e6ef062d1c3e5a6d13e6d76e956af49f63f0e704f8
+      uri: huggingface://bartowski/Llama-3-8B-Instruct-Coder-GGUF/Llama-3-8B-Instruct-Coder-Q4_K_M.gguf
 - <<: *llama3
  name: "llama3-70b-instruct"
  overrides:
@ -242,6 +258,22 @@
    - filename: Llama-3-LewdPlay-8B-evo.q8_0.gguf
      sha256: 1498152d598ff441f73ec6af9d3535875302e7251042d87feb7e71a3618966e8
      uri: huggingface://Undi95/Llama-3-LewdPlay-8B-evo-GGUF/Llama-3-LewdPlay-8B-evo.q8_0.gguf
+- <<: *llama3
+  name: "llama-3-soliloquy-8b-v2-iq-imatrix"
+  license: cc-by-nc-4.0
+  icon: https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/u98dnnRVCwMh6YYGFIyff.png
+  urls:
+    - https://huggingface.co/Lewdiculous/Llama-3-Soliloquy-8B-v2-GGUF-IQ-Imatrix
+  description: |
+    Soliloquy-L3 is a highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base, rich literary expression, and support for up to 24k context length. It outperforms existing ~13B models, delivering enhanced roleplaying capabilities.
+  overrides:
+    context_size: 8192
+    parameters:
+      model: Llama-3-Soliloquy-8B-v2-Q4_K_M-imat.gguf
+  files:
+    - filename: Llama-3-Soliloquy-8B-v2-Q4_K_M-imat.gguf
+      sha256: 3e4e066e57875c36fc3e1c1b0dba506defa5b6ed3e3e80e1f77c08773ba14dc8
+      uri: huggingface://Lewdiculous/Llama-3-Soliloquy-8B-v2-GGUF-IQ-Imatrix/Llama-3-Soliloquy-8B-v2-Q4_K_M-imat.gguf
 - <<: *llama3
  name: "chaos-rp_l3_b-iq-imatrix"
  urls:
@ -319,8 +351,26 @@
      model: Average_Normie_l3_v1_8B-Q4_K_M-imat.gguf
  files:
    - filename: Average_Normie_l3_v1_8B-Q4_K_M-imat.gguf
-      sha256: 9e98cd2672f716a0872912fdc4877969efd14d6f682f28e156f8591591c00d9c
+      sha256: 159eb62f2c8ae8fee10d9ed8386ce592327ca062807194a88e10b7cbb47ef986
      uri: huggingface://Lewdiculous/Average_Normie_l3_v1_8B-GGUF-IQ-Imatrix/Average_Normie_l3_v1_8B-Q4_K_M-imat.gguf
+- <<: *llama3
+  name: "openbiollm-llama3-8b"
+  urls:
+    - https://huggingface.co/aaditya/OpenBioLLM-Llama3-8B-GGUF
+    - https://huggingface.co/aaditya/Llama3-OpenBioLLM-8B
+  license: llama3
+  icon: https://cdn-uploads.huggingface.co/production/uploads/5f3fe13d79c1ba4c353d0c19/KGmRE5w2sepNtwsEu8t7K.jpeg
+  description: |
+    Introducing OpenBioLLM-8B: A State-of-the-Art Open Source Biomedical Large Language Model
+
+    OpenBioLLM-8B is an advanced open source language model designed specifically for the biomedical domain. Developed by Saama AI Labs, this model leverages cutting-edge techniques to achieve state-of-the-art performance on a wide range of biomedical tasks.
+  overrides:
+    parameters:
+      model: openbiollm-llama3-8b.Q4_K_M.gguf
+  files:
+    - filename: openbiollm-llama3-8b.Q4_K_M.gguff
+      sha256: 806fa724139b6a2527e33a79c25a13316188b319d4eed33e20914d7c5955d349
+      uri: huggingface://aaditya/OpenBioLLM-Llama3-8B-GGUF/openbiollm-llama3-8b.Q4_K_M.gguf
 - <<: *llama3
  name: "llama-3-8b-lexifun-uncensored-v1"
  icon: "https://cdn-uploads.huggingface.co/production/uploads/644ad182f434a6a63b18eee6/GrOs1IPG5EXR3MOCtcQiz.png"
@ -406,6 +456,25 @@
    - filename: Aura_Uncensored_l3_8B-Q4_K_M-imat.gguf
      sha256: 265ded6a4f439bec160f394e3083a4a20e32ebb9d1d2d85196aaab23dab87fb2
      uri: huggingface://Lewdiculous/Aura_Uncensored_l3_8B-GGUF-IQ-Imatrix/Aura_Uncensored_l3_8B-Q4_K_M-imat.gguf
+- <<: *llama3
+  name: "llama-3-lumimaid-8b-v0.1"
+  urls:
+    - https://huggingface.co/NeverSleep/Llama-3-Lumimaid-8B-v0.1-GGUF
+  icon: https://cdn-uploads.huggingface.co/production/uploads/630dfb008df86f1e5becadc3/d3QMaxy3peFTpSlWdWF-k.png
+  license: cc-by-nc-4.0
+  description: |
+      This model uses the Llama3 prompting format
+
+      Llama3 trained on our RP datasets, we tried to have a balance between the ERP and the RP, not too horny, but just enough.
+
+      We also added some non-RP dataset, making the model less dumb overall. It should look like a 40%/60% ratio for Non-RP/RP+ERP data.
+  overrides:
+    parameters:
+      model: Llama-3-Lumimaid-8B-v0.1.q4_k_m.gguf
+  files:
+    - filename: Llama-3-Lumimaid-8B-v0.1.q4_k_m.gguf
+      sha256: 23ac0289da0e096d5c00f6614dfd12c94dceecb02c313233516dec9225babbda
+      uri: huggingface://NeverSleep/Llama-3-Lumimaid-8B-v0.1-GGUF/Llama-3-Lumimaid-8B-v0.1.q4_k_m.gguf
 - <<: *llama3
  name: "suzume-llama-3-8B-multilingual"
  urls:
@ -520,6 +589,62 @@
    - filename: Noromaid-13B-0.4-DPO.q4_k_m.gguf
      sha256: cb28e878d034fae3d0b43326c5fc1cfb4ab583b17c56e41d6ce023caec03c1c1
      uri: huggingface://NeverSleep/Noromaid-13B-0.4-DPO-GGUF/Noromaid-13B-0.4-DPO.q4_k_m.gguf
+### START Vicuna based
+- &wizardlm2
+  url: "github:mudler/LocalAI/gallery/wizardlm2.yaml@master"
+  name: "wizardlm2-7b"
+  description: |
+    We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.
+
+      WizardLM-2 8x22B is our most advanced model, demonstrates highly competitive performance compared to those leading proprietary works and consistently outperforms all the existing state-of-the-art opensource models.
+      WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size.
+      WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.
+  icon: https://github.com/nlpxucan/WizardLM/raw/main/imgs/WizardLM.png
+  license: apache-2.0
+  urls:
+    - https://huggingface.co/MaziyarPanahi/WizardLM-2-7B-GGUF
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - cpu
+    - mistral
+  overrides:
+    parameters:
+      model: WizardLM-2-7B.Q4_K_M.gguf
+  files:
+    - filename: WizardLM-2-7B.Q4_K_M.gguf
+      sha256: 613212417701a26fd43f565c5c424a2284d65b1fddb872b53a99ef8add796f64
+      uri: huggingface://MaziyarPanahi/WizardLM-2-7B-GGUF/WizardLM-2-7B.Q4_K_M.gguf
+### moondream2
+- url: "github:mudler/LocalAI/gallery/moondream.yaml@master"
+  license: apache-2.0
+  description: |
+    a tiny vision language model that kicks ass and runs anywhere
+  icon: https://github.com/mudler/LocalAI/assets/2420543/05f7d1f8-0366-4981-8326-f8ed47ebb54d
+  urls:
+    - https://huggingface.co/vikhyatk/moondream2
+    - https://huggingface.co/moondream/moondream2-gguf
+    - https://github.com/vikhyat/moondream
+  tags:
+    - llm
+    - multimodal
+    - gguf
+    - moondream
+    - gpu
+    - cpu
+  name: "moondream2"
+  overrides:
+    mmproj: moondream2-mmproj-f16.gguf
+    parameters:
+      model: moondream2-text-model-f16.gguf
+  files:
+    - filename: moondream2-text-model-f16.gguf
+      sha256: 4e17e9107fb8781629b3c8ce177de57ffeae90fe14adcf7b99f0eef025889696
+      uri: huggingface://moondream/moondream2-gguf/moondream2-text-model-f16.gguf
+    - filename: moondream2-mmproj-f16.gguf
+      sha256: 4cc1cb3660d87ff56432ebeb7884ad35d67c48c7b9f6b2856f305e39c38eed8f
+      uri: huggingface://moondream/moondream2-gguf/moondream2-mmproj-f16.gguf
 ### START LLaVa
 - &llava
  url: "github:mudler/LocalAI/gallery/llava.yaml@master"
@ -575,7 +700,9 @@
      sha256: 09c230de47f6f843e4841656f7895cac52c6e7ec7392acb5e8527de8b775c45a
      uri: huggingface://jartine/llava-v1.5-7B-GGUF/llava-v1.5-7b-mmproj-Q8_0.gguf
 - <<: *llama3
-  name: "poppy_porpoise-v0.7-l3-8b-iq-imatrix"
+  name: "poppy_porpoise-v0.72-l3-8b-iq-imatrix"
+  urls:
+    - https://huggingface.co/Lewdiculous/Poppy_Porpoise-0.72-L3-8B-GGUF-IQ-Imatrix
  description: |
      "Poppy Porpoise" is a cutting-edge AI roleplay assistant based on the Llama 3 8B model, specializing in crafting unforgettable narrative experiences. With its advanced language capabilities, Poppy expertly immerses users in an interactive and engaging adventure, tailoring each adventure to their individual preferences.

@ -590,16 +717,41 @@
    - cpu
    - llava-1.5
  overrides:
-    mmproj: Llava_1.5_Llama3_mmproj.gguf
+    mmproj: Llava_1.5_Llama3_mmproj_updated.gguf
    parameters:
-      model: Poppy_Porpoise-v0.7-L3-8B-Q4_K_M-imat.gguf
+      model: Poppy_Porpoise-0.72-L3-8B-Q4_K_M-imat.gguf
  files:
-    - filename: Poppy_Porpoise-v0.7-L3-8B-Q4_K_M-imat.gguf
-      sha256: 04badadd6c88cd9c706efef8f5cd337057c805e43dd440a5936f87720c37eb33
-      uri: huggingface://Lewdiculous/Poppy_Porpoise-v0.7-L3-8B-GGUF-IQ-Imatrix/Poppy_Porpoise-v0.7-L3-8B-Q4_K_M-imat.gguf
-    - filename: Llava_1.5_Llama3_mmproj.gguf
-      sha256: d2a9ca943975f6c49c4d55886e873f676a897cff796e92410ace6c20f4efd03b
-      uri: huggingface://ChaoticNeutrals/Llava_1.5_Llama3_mmproj/mmproj-model-f16.gguf
+    - filename: Poppy_Porpoise-0.72-L3-8B-Q4_K_M-imat.gguf
+      sha256: 53743717f929f73aa4355229de114d9b81814cb2e83c6cc1c6517844da20bfd5
+      uri: huggingface://Lewdiculous/Poppy_Porpoise-0.72-L3-8B-GGUF-IQ-Imatrix/Poppy_Porpoise-0.72-L3-8B-Q4_K_M-imat.gguf
+    - filename: Llava_1.5_Llama3_mmproj_updated.gguf
+      sha256: 4f2bb77ca60f2c932d1c6647d334f5d2cd71966c19e850081030c9883ef1906c
+      uri: https://huggingface.co/ChaoticNeutrals/LLaVA-Llama-3-8B-mmproj-Updated/resolve/main/llava-v1.5-8B-Updated-Stop-Token/mmproj-model-f16.gguf
+- <<: *llama3
+  name: "llava-llama-3-8b-v1_1"
+  description: |
+      llava-llama-3-8b-v1_1 is a LLaVA model fine-tuned from meta-llama/Meta-Llama-3-8B-Instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.
+  urls:
+    - https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf
+  tags:
+    - llm
+    - multimodal
+    - gguf
+    - gpu
+    - llama3
+    - cpu
+    - llava
+  overrides:
+    mmproj: llava-llama-3-8b-v1_1-mmproj-f16.gguf
+    parameters:
+      model: llava-llama-3-8b-v1_1-int4.gguf
+  files:
+    - filename: llava-llama-3-8b-v1_1-int4.gguf
+      sha256: b6e1d703db0da8227fdb7127d8716bbc5049c9bf17ca2bb345be9470d217f3fc
+      uri: huggingface://xtuner/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-int4.gguf
+    - filename: llava-llama-3-8b-v1_1-mmproj-f16.gguf
+      sha256: eb569aba7d65cf3da1d0369610eb6869f4a53ee369992a804d5810a80e9fa035
+      uri: huggingface://xtuner/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-mmproj-f16.gguf
 ### START Phi-2
 - &phi-2-chat
  url: "github:mudler/LocalAI/gallery/phi-2-chat.yaml@master"
@ -697,7 +849,7 @@
      model: Phi-3-mini-4k-instruct-q4.gguf
  files:
    - filename: "Phi-3-mini-4k-instruct-q4.gguf"
-      sha256: "4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e"
+      sha256: "8a83c7fb9049a9b2e92266fa7ad04933bb53aa1e85136b7b30f1b8000ff2edef"
      uri: "huggingface://microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf"
 - <<: *phi-3
  name: "phi-3-mini-4k-instruct:fp16"
@ -760,6 +912,58 @@
    - filename: "Hermes-2-Pro-Mistral-7B.Q8_0.gguf"
      sha256: "b6d95d7ec9a395b7568cc94b0447fd4f90b6f69d6e44794b1fbb84e3f732baca"
      uri: "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q8_0.gguf"
+### LLAMA3 version
+- <<: *hermes-2-pro-mistral
+  name: "hermes-2-pro-llama-3-8b"
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - llama3
+    - cpu
+  urls:
+    - https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF
+  overrides:
+    parameters:
+      model: Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+  files:
+    - filename: "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf"
+      sha256: "10c52a4820137a35947927be741bb411a9200329367ce2590cc6757cd98e746c"
+      uri: "huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf"
+- <<: *hermes-2-pro-mistral
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - llama3
+    - cpu
+  name: "hermes-2-pro-llama-3-8b:Q5_K_M"
+  urls:
+    - https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF
+  overrides:
+    parameters:
+      model: Hermes-2-Pro-Llama-3-8B-Q5_K_M.gguf
+  files:
+    - filename: "Hermes-2-Pro-Llama-3-8B-Q5_K_M.gguf"
+      sha256: "107f3f55e26b8cc144eadd83e5f8a60cfd61839c56088fa3ae2d5679abf45f29"
+      uri: "huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q5_K_M.gguf"
+- <<: *hermes-2-pro-mistral
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - llama3
+    - cpu
+  name: "hermes-2-pro-llama-3-8b:Q8_0"
+  urls:
+    - https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF
+  overrides:
+    parameters:
+      model: Hermes-2-Pro-Llama-3-8B-Q8_0.gguf
+  files:
+    - filename: "Hermes-2-Pro-Llama-3-8B-Q8_0.gguf"
+      sha256: "d138388cfda04d185a68eaf2396cf7a5cfa87d038a20896817a9b7cf1806f532"
+      uri: "huggingface://NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/Hermes-2-Pro-Llama-3-8B-Q8_0.gguf"
 - <<: *hermes-2-pro-mistral
  name: "biomistral-7b"
  description: |
@ -868,11 +1072,19 @@
  urls:
    - https://huggingface.co/fakezeta/Phi-3-mini-128k-instruct-ov-int8
  overrides:
+    trust_remote_code: true
    context_size: 131072
    parameters:
      model: fakezeta/Phi-3-mini-128k-instruct-ov-int8
    stopwords:
      - <|end|>
+  tags:
+    - llm
+    - openvino
+    - gpu
+    - phi3
+    - cpu
+    - Remote Code Enabled
 - <<: *openvino
  name: "openvino-starling-lm-7b-beta-openvino-int8"
  urls:
@ -881,6 +1093,12 @@
    context_size: 8192
    parameters:
      model: fakezeta/Starling-LM-7B-beta-openvino-int8
+  tags:
+    - llm
+    - openvino
+    - gpu
+    - mistral
+    - cpu
 - <<: *openvino
  name: "openvino-wizardlm2"
  urls:
@ -889,6 +1107,50 @@
    context_size: 8192
    parameters:
      model: fakezeta/Not-WizardLM-2-7B-ov-int8
+- <<: *openvino
+  name: "openvino-hermes2pro-llama3"
+  urls:
+    - https://huggingface.co/fakezeta/Hermes-2-Pro-Llama-3-8B-ov-int8
+  overrides:
+    context_size: 8192
+    parameters:
+      model: fakezeta/Hermes-2-Pro-Llama-3-8B-ov-int8
+  tags:
+    - llm
+    - openvino
+    - gpu
+    - llama3
+    - cpu
+- <<: *openvino
+  name: "openvino-multilingual-e5-base"
+  urls:
+    - https://huggingface.co/intfloat/multilingual-e5-base
+  overrides:
+    embeddings: true
+    type: OVModelForFeatureExtraction
+    parameters:
+      model: intfloat/multilingual-e5-base
+  tags:
+    - llm
+    - openvino
+    - gpu
+    - embedding
+    - cpu
+- <<: *openvino
+  name: "openvino-all-MiniLM-L6-v2"
+  urls:
+    - https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
+  overrides:
+    embeddings: true
+    type: OVModelForFeatureExtraction
+    parameters:
+      model: sentence-transformers/all-MiniLM-L6-v2
+  tags:
+    - llm
+    - openvino
+    - gpu
+    - embedding
+    - cpu
 ### START Embeddings
 - &sentencentransformers
  description: |
@ -994,7 +1256,7 @@
    - text-to-speech
    - cpu

-  override:
+  overrides:
    parameters:
      model: en-us-kathleen-low.onnx
  files:
@ -1002,7 +1264,7 @@
      uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-kathleen-low.tar.gz
 - <<: *piper
  name: voice-ca-upc_ona-x-low
-  override:
+  overrides:
    parameters:
      model: ca-upc_ona-x-low.onnx
  files:
@ -1011,7 +1273,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-ca-upc_pau-x-low
-  override:
+  overrides:
    parameters:
      model: ca-upc_pau-x-low.onnx
  files:
@ -1020,7 +1282,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-da-nst_talesyntese-medium
-  override:
+  overrides:
    parameters:
      model: da-nst_talesyntese-medium.onnx
  files:
@ -1029,7 +1291,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-de-eva_k-x-low
-  override:
+  overrides:
    parameters:
      model: de-eva_k-x-low.onnx
  files:
@ -1038,7 +1300,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-de-karlsson-low
-  override:
+  overrides:
    parameters:
      model: de-karlsson-low.onnx
  files:
@ -1047,7 +1309,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-de-kerstin-low
-  override:
+  overrides:
    parameters:
      model: de-kerstin-low.onnx
  files:
@ -1056,7 +1318,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-de-pavoque-low
-  override:
+  overrides:
    parameters:
      model: de-pavoque-low.onnx
  files:
@ -1065,7 +1327,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-de-ramona-low
-  override:
+  overrides:
    parameters:
      model: de-ramona-low.onnx
  files:
@ -1075,7 +1337,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-de-thorsten-low

-  override:
+  overrides:
    parameters:
      model: de-thorsten-low.onnx
  files:
@ -1085,7 +1347,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-el-gr-rapunzelina-low

-  override:
+  overrides:
    parameters:
      model: el-gr-rapunzelina-low.onnx
  files:
@ -1095,7 +1357,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-gb-alan-low

-  override:
+  overrides:
    parameters:
      model: en-gb-alan-low.onnx
  files:
@ -1105,7 +1367,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-gb-southern_english_female-low

-  override:
+  overrides:
    parameters:
      model: en-gb-southern_english
  files:
@ -1115,7 +1377,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-amy-low

-  override:
+  overrides:
    parameters:
      model: en-us-amy-low.onnx
  files:
@ -1125,7 +1387,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-danny-low

-  override:
+  overrides:
    parameters:
      model: en-us-danny-low.onnx
  files:
@ -1135,7 +1397,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-kathleen-low

-  override:
+  overrides:
    parameters:
      model: en-us-kathleen-low.onnx
  files:
@ -1145,7 +1407,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-lessac-low

-  override:
+  overrides:
    parameters:
      model: en-us-lessac-low.onnx
  files:
@ -1155,7 +1417,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-lessac-medium

-  override:
+  overrides:
    parameters:
      model: en-us-lessac-medium.onnx
  files:
@ -1165,7 +1427,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-libritts-high

-  override:
+  overrides:
    parameters:
      model: en-us-libritts-high.onnx
  files:
@ -1175,7 +1437,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-ryan-high

-  override:
+  overrides:
    parameters:
      model: en-us-ryan-high.onnx
  files:
@ -1185,7 +1447,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-ryan-low

-  override:
+  overrides:
    parameters:
      model: en-us-ryan-low.onnx
  files:
@ -1196,7 +1458,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us-ryan-medium

-  override:
+  overrides:
    parameters:
      model: en-us-ryan-medium.onnx
  files:
@ -1206,7 +1468,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-en-us_lessac
-  override:
+  overrides:
    parameters:
      model: en-us-lessac.onnx
  files:
@ -1216,7 +1478,7 @@
 - <<: *piper
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-es-carlfm-x-low
-  override:
+  overrides:
    parameters:
      model: es-carlfm-x-low.onnx
  files:
@ -1227,7 +1489,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-es-mls_10246-low

-  override:
+  overrides:
    parameters:
      model: es-mls_10246-low.onnx
  files:
@ -1238,7 +1500,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-es-mls_9972-low

-  override:
+  overrides:
    parameters:
      model: es-mls_9972-low.onnx
  files:
@ -1249,7 +1511,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-fi-harri-low

-  override:
+  overrides:
    parameters:
      model: fi-harri-low.onnx
  files:
@ -1260,7 +1522,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-fr-gilles-low

-  override:
+  overrides:
    parameters:
      model: fr-gilles-low.onnx
  files:
@ -1271,7 +1533,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-fr-mls_1840-low

-  override:
+  overrides:
    parameters:
      model: fr-mls_1840-low.onnx
  files:
@ -1282,7 +1544,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-fr-siwis-low

-  override:
+  overrides:
    parameters:
      model: fr-siwis-low.onnx
  files:
@ -1293,7 +1555,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-fr-siwis-medium

-  override:
+  overrides:
    parameters:
      model: fr-siwis-medium.onnx
  files:
@ -1304,7 +1566,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-is-bui-medium

-  override:
+  overrides:
    parameters:
      model: is-bui-medium.onnx
  files:
@ -1315,7 +1577,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-is-salka-medium

-  override:
+  overrides:
    parameters:
      model: is-salka-medium.onnx
  files:
@ -1326,7 +1588,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-is-steinn-medium

-  override:
+  overrides:
    parameters:
      model: is-steinn-medium.onnx
  files:
@ -1337,7 +1599,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-is-ugla-medium

-  override:
+  overrides:
    parameters:
      model: is-ugla-medium.onnx
  files:
@ -1348,7 +1610,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-it-riccardo_fasol-x-low

-  override:
+  overrides:
    parameters:
      model: it-riccardo_fasol-x-low.onnx
  files:
@ -1359,7 +1621,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-kk-iseke-x-low

-  override:
+  overrides:
    parameters:
      model: kk-iseke-x-low.onnx
  files:
@ -1370,7 +1632,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-kk-issai-high

-  override:
+  overrides:
    parameters:
      model: kk-issai-high.onnx
  files:
@ -1381,7 +1643,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-kk-raya-x-low

-  override:
+  overrides:
    parameters:
      model: kk-raya-x-low.onnx
  files:
@ -1392,7 +1654,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-ne-google-medium

-  override:
+  overrides:
    parameters:
      model: ne-google-medium.onnx
  files:
@ -1403,7 +1665,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-ne-google-x-low

-  override:
+  overrides:
    parameters:
      model: ne-google-x-low.onnx
  files:
@ -1414,7 +1676,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-nl-mls_5809-low

-  override:
+  overrides:
    parameters:
      model: nl-mls_5809-low.onnx
  files:
@ -1425,7 +1687,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-nl-mls_7432-low

-  override:
+  overrides:
    parameters:
      model: nl-mls_7432-low.onnx
  files:
@ -1436,7 +1698,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-nl-nathalie-x-low

-  override:
+  overrides:
    parameters:
      model: nl-nathalie-x-low.onnx
  files:
@ -1447,7 +1709,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-nl-rdh-medium

-  override:
+  overrides:
    parameters:
      model: nl-rdh-medium.onnx
  files:
@ -1458,7 +1720,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-nl-rdh-x-low

-  override:
+  overrides:
    parameters:
      model: nl-rdh-x-low.onnx
  files:
@ -1469,7 +1731,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-no-talesyntese-medium

-  override:
+  overrides:
    parameters:
      model: no-talesyntese-medium.onnx
  files:
@ -1480,7 +1742,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-pl-mls_6892-low

-  override:
+  overrides:
    parameters:
      model: pl-mls_6892-low.onnx
  files:
@ -1491,7 +1753,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-pt-br-edresson-low

-  override:
+  overrides:
    parameters:
      model: pt-br-edresson-low.onnx
  files:
@ -1502,7 +1764,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-ru-irinia-medium

-  override:
+  overrides:
    parameters:
      model: ru-irinia-medium.onnx
  files:
@ -1513,7 +1775,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-sv-se-nst-medium

-  override:
+  overrides:
    parameters:
      model: sv-se-nst-medium.onnx
  files:
@ -1524,7 +1786,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-uk-lada-x-low

-  override:
+  overrides:
    parameters:
      model: uk-lada-x-low.onnx
  files:
@ -1535,7 +1797,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-vi-25hours-single-low

-  override:
+  overrides:
    parameters:
      model: vi-25hours-single-low.onnx
  files:
@ -1546,7 +1808,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-vi-vivos-x-low

-  override:
+  overrides:
    parameters:
      model: vi-vivos-x-low.onnx
  files:
@ -1557,7 +1819,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-zh-cn-huayan-x-low

-  override:
+  overrides:
    parameters:
      model: zh-cn-huayan-x-low.onnx
  files:
@ -1568,7 +1830,7 @@
  url: github:mudler/LocalAI/gallery/piper.yaml@master
  name: voice-zh_CN-huayan-medium

-  override:
+  overrides:
    parameters:
      model: zh_CN-huayan-medium.onnx
  files:
--- a/gallery/moondream.yaml
+++ b/gallery/moondream.yaml
@ -0,0 +1,19 @@
+---
+name: "moondream2"
+
+
+config_file: |
+    backend: llama-cpp
+    context_size: 2046
+    roles:
+      user: "\nQuestion: "
+      system: "\nSystem: "
+      assistant: "\nAnswer: "
+    stopwords:
+    - "Question:"
+    - "<|endoftext|>"
+    f16: true
+    template:
+      completion: |
+        Complete the following sentence: {{.Input}}
+      chat: "{{.Input}}\nAnswer:\n"
--- a/gallery/openvino.yaml
+++ b/gallery/openvino.yaml
@ -7,6 +7,3 @@ config_file: |
  type: OVModelForCausalLM
  template:
    use_tokenizer_template: true
-  stopwords:
-  - "<|eot_id|>"
-  - "<|end_of_text|>"
--- a/gallery/wizardlm2.yaml
+++ b/gallery/wizardlm2.yaml
@ -0,0 +1,15 @@
+---
+name: "wizardlm2"
+
+config_file: |
+  mmap: true
+  template:
+    chat_message: |-
+      {{if eq .RoleName "assistant"}}ASSISTANT: {{.Content}}</s>{{else if eq .RoleName "system"}}{{.Content}}{{else if eq .RoleName "user"}}USER: {{.Content}}{{end}}
+    chat: "{{.Input}}ASSISTANT: "
+    completion: |-
+      {{.Input}}
+  context_size: 32768
+  f16: true
+  stopwords:
+  - </s>
--- a/go.mod
+++ b/go.mod
@ -67,6 +67,7 @@ require (
 	github.com/Masterminds/semver/v3 v3.2.0 // indirect
 	github.com/Microsoft/go-winio v0.6.0 // indirect
 	github.com/Nvveen/Gotty v0.0.0-20120604004816-cd527374f1e5 // indirect
+	github.com/StackExchange/wmi v1.2.1 // indirect
 	github.com/alecthomas/chroma/v2 v2.8.0 // indirect
 	github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
 	github.com/aymerick/douceur v0.2.0 // indirect
@ -82,6 +83,7 @@ require (
 	github.com/docker/go-connections v0.4.0 // indirect
 	github.com/docker/go-units v0.4.0 // indirect
 	github.com/dsnet/compress v0.0.2-0.20210315054119-f66993602bf5 // indirect
+	github.com/ghodss/yaml v1.0.0 // indirect
 	github.com/go-logr/stdr v1.2.2 // indirect
 	github.com/go-openapi/jsonpointer v0.21.0 // indirect
 	github.com/go-openapi/jsonreference v0.21.0 // indirect
@ -95,7 +97,10 @@ require (
 	github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
 	github.com/gorilla/css v1.0.1 // indirect
 	github.com/huandu/xstrings v1.3.3 // indirect
+	github.com/jaypipes/ghw v0.12.0 // indirect
+	github.com/jaypipes/pcidb v1.0.0 // indirect
 	github.com/josharian/intern v1.0.0 // indirect
+	github.com/klauspost/cpuid/v2 v2.2.7 // indirect
 	github.com/klauspost/pgzip v1.2.5 // indirect
 	github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
 	github.com/mailru/easyjson v0.7.7 // indirect
@ -103,6 +108,7 @@ require (
 	github.com/microcosm-cc/bluemonday v1.0.26 // indirect
 	github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db // indirect
 	github.com/mitchellh/copystructure v1.0.0 // indirect
+	github.com/mitchellh/go-homedir v1.1.0 // indirect
 	github.com/mitchellh/mapstructure v1.5.0 // indirect
 	github.com/mitchellh/reflectwalk v1.0.0 // indirect
 	github.com/moby/term v0.0.0-20201216013528-df9cb8a40635 // indirect
@ -113,6 +119,7 @@ require (
 	github.com/opencontainers/go-digest v1.0.0 // indirect
 	github.com/opencontainers/image-spec v1.0.2 // indirect
 	github.com/opencontainers/runc v1.1.12 // indirect
+	github.com/philippgille/chromem-go v0.5.0 // indirect
 	github.com/pierrec/lz4/v4 v4.1.2 // indirect
 	github.com/pkg/errors v0.9.1 // indirect
 	github.com/pkoukk/tiktoken-go v0.1.2 // indirect
@ -139,6 +146,7 @@ require (
 	google.golang.org/genproto/googleapis/rpc v0.0.0-20230822172742-b8732ec3820d // indirect
 	gopkg.in/fsnotify.v1 v1.4.7 // indirect
 	gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 // indirect
+	howett.net/plist v1.0.0 // indirect
 )

 require (
--- a/go.sum
+++ b/go.sum
@ -14,6 +14,8 @@ github.com/Microsoft/go-winio v0.6.0 h1:slsWYD/zyx7lCXoZVlvQrj0hPTM1HI4+v1sIda2y
 github.com/Microsoft/go-winio v0.6.0/go.mod h1:cTAf44im0RAYeL23bpB+fzCyDH2MJiz2BO69KH/soAE=
 github.com/Nvveen/Gotty v0.0.0-20120604004816-cd527374f1e5 h1:TngWCqHvy9oXAN6lEVMRuU21PR1EtLVZJmdB18Gu3Rw=
 github.com/Nvveen/Gotty v0.0.0-20120604004816-cd527374f1e5/go.mod h1:lmUJ/7eu/Q8D7ML55dXQrVaamCz2vxCfdQBasLZfHKk=
+github.com/StackExchange/wmi v1.2.1 h1:VIkavFPXSjcnS+O8yTq7NI32k0R5Aj+v39y29VYDOSA=
+github.com/StackExchange/wmi v1.2.1/go.mod h1:rcmrprowKIVzvc+NUiLncP2uuArMWLCbu9SBzvHz7e8=
 github.com/alecthomas/assert/v2 v2.6.0 h1:o3WJwILtexrEUk3cUVal3oiQY2tfgr/FHWiz/v2n4FU=
 github.com/alecthomas/assert/v2 v2.6.0/go.mod h1:Bze95FyfUr7x34QZrjL+XP+0qgp/zg8yS+TtBj1WA3k=
 github.com/alecthomas/chroma/v2 v2.8.0 h1:w9WJUjFFmHHB2e8mRpL9jjy3alYDlU0QLDezj1xE264=
@ -71,6 +73,8 @@ github.com/fsnotify/fsnotify v1.7.0 h1:8JEhPFa5W2WU7YfeZzPNqzMP6Lwt7L2715Ggo0nos
 github.com/fsnotify/fsnotify v1.7.0/go.mod h1:40Bi/Hjc2AVfZrqy+aj+yEI+/bRxZnMJyTJwOpGvigM=
 github.com/ggerganov/whisper.cpp/bindings/go v0.0.0-20230628193450-85ed71aaec8e h1:KtbU2JR3lJuXFASHG2+sVLucfMPBjWKUUKByX6C81mQ=
 github.com/ggerganov/whisper.cpp/bindings/go v0.0.0-20230628193450-85ed71aaec8e/go.mod h1:QIjZ9OktHFG7p+/m3sMvrAJKKdWrr1fZIK0rM6HZlyo=
+github.com/ghodss/yaml v1.0.0 h1:wQHKEahhL6wmXdzwWG11gIVCkOv05bNOh+Rxn0yngAk=
+github.com/ghodss/yaml v1.0.0/go.mod h1:4dBDuWmgqj2HViK6kFavaiC9ZROes6MMH2rRYeMEF04=
 github.com/go-audio/audio v1.0.0 h1:zS9vebldgbQqktK4H0lUqWrG8P0NxCJVqcj7ZpNnwd4=
 github.com/go-audio/audio v1.0.0/go.mod h1:6uAu0+H2lHkwdGsAY+j2wHPNPpPoeg5AaEFh9FlA+Zs=
 github.com/go-audio/riff v1.0.0 h1:d8iCGbDvox9BfLagY94fBynxSPHO80LmZCaOsmKxokA=
@ -82,6 +86,7 @@ github.com/go-logr/logr v1.2.4 h1:g01GSCwiDw2xSZfjJ2/T9M+S6pFdcNtFYsp+Y43HYDQ=
 github.com/go-logr/logr v1.2.4/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
 github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
 github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE=
+github.com/go-ole/go-ole v1.2.5/go.mod h1:pprOEPIfldk/42T2oK7lQ4v4JSDwmV0As9GaiUsvbm0=
 github.com/go-ole/go-ole v1.2.6 h1:/Fpf6oFPoeFik9ty7siob0G6Ke8QvQEuVcuChpwXzpY=
 github.com/go-ole/go-ole v1.2.6/go.mod h1:pprOEPIfldk/42T2oK7lQ4v4JSDwmV0As9GaiUsvbm0=
 github.com/go-openapi/jsonpointer v0.21.0 h1:YgdVicSA9vH5RiHs9TZW5oyafXZFc6+2Vc1rr/O9oNQ=
@ -162,6 +167,11 @@ github.com/ianlancetaylor/demangle v0.0.0-20200824232613-28f6c0f3b639/go.mod h1:
 github.com/imdario/mergo v0.3.11/go.mod h1:jmQim1M+e3UYxmgPu/WyfjB3N3VflVyUjjjwH0dnCYA=
 github.com/imdario/mergo v0.3.16 h1:wwQJbIsHYGMUyLSPrEq1CT16AhnhNJQ51+4fdHUnCl4=
 github.com/imdario/mergo v0.3.16/go.mod h1:WBLT9ZmE3lPoWsEzCh9LPo3TiwVN+ZKEjmz+hD27ysY=
+github.com/jaypipes/ghw v0.12.0 h1:xU2/MDJfWmBhJnujHY9qwXQLs3DBsf0/Xa9vECY0Tho=
+github.com/jaypipes/ghw v0.12.0/go.mod h1:jeJGbkRB2lL3/gxYzNYzEDETV1ZJ56OKr+CSeSEym+g=
+github.com/jaypipes/pcidb v1.0.0 h1:vtZIfkiCUE42oYbJS0TAq9XSfSmcsgo9IdxSm9qzYU8=
+github.com/jaypipes/pcidb v1.0.0/go.mod h1:TnYUvqhPBzCKnH34KrIX22kAeEbDCSRJ9cqLRCuNDfk=
+github.com/jessevdk/go-flags v1.4.0/go.mod h1:4FA24M0QyGHXBuZZK/XkWh8h0e1EYbRYJSGM75WSRxI=
 github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0=
 github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4=
 github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY=
@ -174,6 +184,8 @@ github.com/klauspost/compress v1.11.4/go.mod h1:aoV0uJVorq1K+umq18yTdKaF57EivdYs
 github.com/klauspost/compress v1.17.0 h1:Rnbp4K9EjcDuVuHtd0dgA4qNuv9yKDYKK1ulpJwgrqM=
 github.com/klauspost/compress v1.17.0/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
 github.com/klauspost/cpuid v1.2.0/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=
+github.com/klauspost/cpuid/v2 v2.2.7 h1:ZWSB3igEs+d0qvnxR/ZBzXVmxkgt8DdzP6m9pfuVLDM=
+github.com/klauspost/cpuid/v2 v2.2.7/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws=
 github.com/klauspost/pgzip v1.2.5 h1:qnWYvvKqedOF2ulHpMG72XQol4ILEJ8k2wwRl/Km8oE=
 github.com/klauspost/pgzip v1.2.5/go.mod h1:Ch1tH69qFZu15pkjo5kYi6mth2Zzwzt50oCQKQE9RUs=
 github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
@ -210,6 +222,8 @@ github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db h1:62I3jR2Em
 github.com/mitchellh/colorstring v0.0.0-20190213212951-d06e56a500db/go.mod h1:l0dey0ia/Uv7NcFFVbCLtqEBQbrT4OCwCSKTEv6enCw=
 github.com/mitchellh/copystructure v1.0.0 h1:Laisrj+bAB6b/yJwB5Bt3ITZhGJdqmxquMKeZ+mmkFQ=
 github.com/mitchellh/copystructure v1.0.0/go.mod h1:SNtv71yrdKgLRyLFxmLdkAbkKEFWgYaq1OVrnRcwhnw=
+github.com/mitchellh/go-homedir v1.1.0 h1:lukF9ziXFxDFPkA1vsr5zpc1XuPDn/wFntq5mG+4E0Y=
+github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
 github.com/mitchellh/mapstructure v1.5.0 h1:jeMsZIYE/09sWLaz43PL7Gy6RuMjD2eJVyuac5Z2hdY=
 github.com/mitchellh/mapstructure v1.5.0/go.mod h1:bFUtVrKA4DC2yAKiSyO/QUcy7e+RRV2QTWOzhPopBRo=
 github.com/mitchellh/reflectwalk v1.0.0 h1:9D+8oIskB4VJBN5SFlmc27fSlIBZaov1Wpk/IfikLNY=
@ -260,6 +274,8 @@ github.com/otiai10/openaigo v1.6.0 h1:YTQEbtDSvawETOB/Kmb/6JvuHdHH/eIpSQfHVufiwY
 github.com/otiai10/openaigo v1.6.0/go.mod h1:kIaXc3V+Xy5JLplcBxehVyGYDtufHp3PFPy04jOwOAI=
 github.com/phayes/freeport v0.0.0-20220201140144-74d24b5ae9f5 h1:Ii+DKncOVM8Cu1Hc+ETb5K+23HdAMvESYE3ZJ5b5cMI=
 github.com/phayes/freeport v0.0.0-20220201140144-74d24b5ae9f5/go.mod h1:iIss55rKnNBTvrwdmkUpLnDpZoAHvWaiq5+iMmen4AE=
+github.com/philippgille/chromem-go v0.5.0 h1:bryX0F3N6jnN/21iBd8i2/k9EzPTZn3nyiqAti19si8=
+github.com/philippgille/chromem-go v0.5.0/go.mod h1:hTd+wGEm/fFPQl7ilfCwQXkgEUxceYh86iIdoKMolPo=
 github.com/pierrec/lz4/v4 v4.1.2 h1:qvY3YFXRQE/XB8MlLzJH7mSzBs74eA2gg52YTk6jUPM=
 github.com/pierrec/lz4/v4 v4.1.2/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4=
 github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
@ -430,6 +446,7 @@ golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBc
 golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.2.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
+golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
@ -489,6 +506,7 @@ gopkg.in/fsnotify.v1 v1.4.7/go.mod h1:Tz8NjZHkW78fSQdbUxIjBTcgA1z1m8ZHf0WmKUhAMy
 gopkg.in/op/go-logging.v1 v1.0.0-20160211212156-b2cb9fa56473/go.mod h1:N1eN2tsCx0Ydtgjl4cqmbRCsY4/+z4cYDeqwZTk6zog=
 gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 h1:uRGJdciOHaEIrze2W8Q3AKkepLTh2hOroT7a+7czfdQ=
 gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7/go.mod h1:dt/ZhP58zS4L8KSrWDmTeBkI65Dw0HsyUHuEVlX15mw=
+gopkg.in/yaml.v1 v1.0.0-20140924161607-9f9df34309c0/go.mod h1:WDnlLJ4WF5VGsH/HVa3CI79GS0ol3YnhVnKP89i0kNg=
 gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
 gopkg.in/yaml.v2 v2.2.4/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
 gopkg.in/yaml.v2 v2.3.0/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
@ -500,3 +518,5 @@ gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
 gotest.tools/v3 v3.0.2/go.mod h1:3SzNCllyD9/Y+b5r9JIKQ474KzkZyqLqEfYqMsX94Bk=
 gotest.tools/v3 v3.3.0 h1:MfDY1b1/0xN1CyMlQDac0ziEy9zJQd9CXBRRDHw2jJo=
 gotest.tools/v3 v3.3.0/go.mod h1:Mcr9QNxkg0uMvy/YElmo4SpXgJKWgQvYrT7Kw5RzJ1A=
+howett.net/plist v1.0.0 h1:7CrbWYbPPO/PyNy38b2EB/+gYbjCe2DXBxgtOOZbSQM=
+howett.net/plist v1.0.0/go.mod h1:lqaXoTrLY4hg8tnEzNru53gicrbv7rrk+2xJA/7hw9g=
--- a/pkg/gallery/gallery.go
+++ b/pkg/gallery/gallery.go
@ -55,6 +55,9 @@ func InstallModelFromGallery(galleries []Gallery, name string, basePath string,
 			installName = req.Name
 		}

+		// Copy the model configuration from the request schema
+		config.URLs = append(config.URLs, model.URLs...)
+		config.Icon = model.Icon
 		config.Files = append(config.Files, req.AdditionalFiles...)
 		config.Files = append(config.Files, model.AdditionalFiles...)

@ -186,6 +189,12 @@ func getGalleryModels(gallery Gallery, basePath string) ([]*GalleryModel, error)
 	return models, nil
 }

+func GetLocalModelConfiguration(basePath string, name string) (*Config, error) {
+	name = strings.ReplaceAll(name, string(os.PathSeparator), "__")
+	galleryFile := filepath.Join(basePath, galleryFileName(name))
+	return ReadConfigFile(galleryFile)
+}
+
 func DeleteModelFromSystem(basePath string, name string, additionalFiles []string) error {
 	// os.PathSeparator is not allowed in model names. Replace them with "__" to avoid conflicts with file paths.
 	name = strings.ReplaceAll(name, string(os.PathSeparator), "__")
@ -228,5 +237,8 @@ func DeleteModelFromSystem(basePath string, name string, additionalFiles []strin
 		err = errors.Join(err, fmt.Errorf("failed to remove file %s: %w", configFile, e))
 	}

+	// Delete gallery config file
+	os.Remove(galleryFile)
+
 	return err
 }
--- a/pkg/gallery/models.go
+++ b/pkg/gallery/models.go
@ -40,8 +40,10 @@ prompt_templates:
 */
 // Config is the model configuration which contains all the model details
 // This configuration is read from the gallery endpoint and is used to download and install the model
+// It is the internal structure, separated from the request
 type Config struct {
 	Description     string           `yaml:"description"`
+	Icon            string           `yaml:"icon"`
 	License         string           `yaml:"license"`
 	URLs            []string         `yaml:"urls"`
 	Name            string           `yaml:"name"`
--- a/pkg/gallery/op.go
+++ b/pkg/gallery/op.go
@ -1,10 +1,10 @@
 package gallery

 type GalleryOp struct {
-	Id          string
-	GalleryName string
-	ConfigURL   string
-	Delete      bool
+	Id               string
+	GalleryModelName string
+	ConfigURL        string
+	Delete           bool

 	Req       GalleryModel
 	Galleries []Gallery
@ -19,4 +19,5 @@ type GalleryOpStatus struct {
 	Progress           float64 `json:"progress"`
 	TotalFileSize      string  `json:"file_size"`
 	DownloadedFileSize string  `json:"downloaded_size"`
+	GalleryModelName   string  `json:"gallery_model_name"`
 }
--- a/pkg/gallery/request.go
+++ b/pkg/gallery/request.go
@ -1,5 +1,10 @@
 package gallery

+import (
+	"fmt"
+	"strings"
+)
+
 // GalleryModel is the struct used to represent a model in the gallery returned by the endpoint.
 // It is used to install the model by resolving the URL and downloading the files.
 // The other fields are used to override the configuration of the model.
@ -22,3 +27,23 @@ type GalleryModel struct {
 	// Installed is used to indicate if the model is installed or not
 	Installed bool `json:"installed,omitempty" yaml:"installed,omitempty"`
 }
+
+func (m GalleryModel) ID() string {
+	return fmt.Sprintf("%s@%s", m.Gallery.Name, m.Name)
+}
+
+type GalleryModels []*GalleryModel
+
+func (gm GalleryModels) Search(term string) GalleryModels {
+	var filteredModels GalleryModels
+
+	for _, m := range gm {
+		if strings.Contains(m.Name, term) ||
+			strings.Contains(m.Description, term) ||
+			strings.Contains(m.Gallery.Name, term) ||
+			strings.Contains(strings.Join(m.Tags, ","), term) {
+			filteredModels = append(filteredModels, m)
+		}
+	}
+	return filteredModels
+}
--- a/pkg/langchain/huggingface.go
+++ b/pkg/langchain/huggingface.go
@ -2,6 +2,7 @@ package langchain

 import (
 	"context"
+	"fmt"

 	"github.com/tmc/langchaingo/llms"
 	"github.com/tmc/langchaingo/llms/huggingface"
@ -9,11 +10,16 @@ import (

 type HuggingFace struct {
 	modelPath string
+	token     string
 }

-func NewHuggingFace(repoId string) (*HuggingFace, error) {
+func NewHuggingFace(repoId, token string) (*HuggingFace, error) {
+	if token == "" {
+		return nil, fmt.Errorf("no huggingface token provided")
+	}
 	return &HuggingFace{
 		modelPath: repoId,
+		token:     token,
 	}, nil
 }

@ -21,7 +27,7 @@ func (s *HuggingFace) PredictHuggingFace(text string, opts ...PredictOption) (*P
 	po := NewPredictOptions(opts...)

 	// Init client
-	llm, err := huggingface.New()
+	llm, err := huggingface.New(huggingface.WithToken(s.token))
 	if err != nil {
 		return nil, err
 	}
--- a/pkg/model/initializers.go
+++ b/pkg/model/initializers.go
@ -2,27 +2,32 @@ package model

 import (
 	"context"
+	"errors"
 	"fmt"
 	"os"
 	"path/filepath"
+	"slices"
 	"strings"
 	"time"

 	grpc "github.com/go-skynet/LocalAI/pkg/grpc"
-	"github.com/hashicorp/go-multierror"
 	"github.com/phayes/freeport"
 	"github.com/rs/zerolog/log"
 )

 var Aliases map[string]string = map[string]string{
-	"go-llama":       LLamaCPP,
-	"llama":          LLamaCPP,
-	"embedded-store": LocalStoreBackend,
+	"go-llama":              LLamaCPP,
+	"llama":                 LLamaCPP,
+	"embedded-store":        LocalStoreBackend,
+	"langchain-huggingface": LCHuggingFaceBackend,
 }

 const (
-	LlamaGGML           = "llama-ggml"
-	LLamaCPP            = "llama-cpp"
+	LlamaGGML = "llama-ggml"
+	LLamaCPP  = "llama-cpp"
+
+	LLamaCPPFallback = "llama-cpp-fallback"
+
 	Gpt4AllLlamaBackend = "gpt4all-llama"
 	Gpt4AllMptBackend   = "gpt4all-mpt"
 	Gpt4AllJBackend     = "gpt4all-j"
@ -34,21 +39,75 @@ const (
 	StableDiffusionBackend = "stablediffusion"
 	TinyDreamBackend       = "tinydream"
 	PiperBackend           = "piper"
-	LCHuggingFaceBackend   = "langchain-huggingface"
+	LCHuggingFaceBackend   = "huggingface"

 	LocalStoreBackend = "local-store"
 )

-var AutoLoadBackends []string = []string{
-	LLamaCPP,
-	LlamaGGML,
-	Gpt4All,
-	BertEmbeddingsBackend,
-	RwkvBackend,
-	WhisperBackend,
-	StableDiffusionBackend,
-	TinyDreamBackend,
-	PiperBackend,
+func backendPath(assetDir, backend string) string {
+	return filepath.Join(assetDir, "backend-assets", "grpc", backend)
+}
+
+// backendsInAssetDir returns the list of backends in the asset directory
+// that should be loaded
+func backendsInAssetDir(assetDir string) ([]string, error) {
+	// Exclude backends from automatic loading
+	excludeBackends := []string{LocalStoreBackend}
+	entry, err := os.ReadDir(backendPath(assetDir, ""))
+	if err != nil {
+		return nil, err
+	}
+	var backends []string
+ENTRY:
+	for _, e := range entry {
+		for _, exclude := range excludeBackends {
+			if e.Name() == exclude {
+				continue ENTRY
+			}
+		}
+		if !e.IsDir() {
+			backends = append(backends, e.Name())
+		}
+	}
+
+	// order backends from the asset directory.
+	// as we scan for backends, we want to keep some order which backends are tried of.
+	// for example, llama.cpp should be tried first, and we want to keep the huggingface backend at the last.
+	// sets a priority list
+	// First has more priority
+	priorityList := []string{
+		// First llama.cpp and llama-ggml
+		LLamaCPP, LLamaCPPFallback, LlamaGGML, Gpt4All,
+	}
+	toTheEnd := []string{
+		// last has to be huggingface
+		LCHuggingFaceBackend,
+		// then bert embeddings
+		BertEmbeddingsBackend,
+	}
+	slices.Reverse(priorityList)
+	slices.Reverse(toTheEnd)
+
+	// order certain backends first
+	for _, b := range priorityList {
+		for i, be := range backends {
+			if be == b {
+				backends = append([]string{be}, append(backends[:i], backends[i+1:]...)...)
+				break
+			}
+		}
+	}
+	// make sure that some others are pushed at the end
+	for _, b := range toTheEnd {
+		for i, be := range backends {
+			if be == b {
+				backends = append(append(backends[:i], backends[i+1:]...), be)
+				break
+			}
+		}
+	}
+
+	return backends, nil
 }

 // starts the grpcModelProcess for the backend, and returns a grpc client
@ -99,7 +158,7 @@ func (ml *ModelLoader) grpcModel(backend string, o *Options) func(string, string
 				client = ModelAddress(uri)
 			}
 		} else {
-			grpcProcess := filepath.Join(o.assetDir, "backend-assets", "grpc", backend)
+			grpcProcess := backendPath(o.assetDir, backend)
 			// Check if the file exists
 			if _, err := os.Stat(grpcProcess); os.IsNotExist(err) {
 				return "", fmt.Errorf("grpc process not found: %s. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS", grpcProcess)
@ -243,7 +302,12 @@ func (ml *ModelLoader) GreedyLoader(opts ...Option) (grpc.Backend, error) {

 	// autoload also external backends
 	allBackendsToAutoLoad := []string{}
-	allBackendsToAutoLoad = append(allBackendsToAutoLoad, AutoLoadBackends...)
+	autoLoadBackends, err := backendsInAssetDir(o.assetDir)
+	if err != nil {
+		return nil, err
+	}
+	log.Debug().Msgf("Loading from the following backends (in order): %+v", autoLoadBackends)
+	allBackendsToAutoLoad = append(allBackendsToAutoLoad, autoLoadBackends...)
 	for _, b := range o.externalBackends {
 		allBackendsToAutoLoad = append(allBackendsToAutoLoad, b)
 	}
@ -271,10 +335,10 @@ func (ml *ModelLoader) GreedyLoader(opts ...Option) (grpc.Backend, error) {
 			log.Info().Msgf("[%s] Loads OK", b)
 			return model, nil
 		} else if modelerr != nil {
-			err = multierror.Append(err, modelerr)
+			err = errors.Join(err, modelerr)
 			log.Info().Msgf("[%s] Fails: %s", b, modelerr.Error())
 		} else if model == nil {
-			err = multierror.Append(err, fmt.Errorf("backend returned no usable model"))
+			err = errors.Join(err, fmt.Errorf("backend returned no usable model"))
 			log.Info().Msgf("[%s] Fails: %s", b, "backend returned no usable model")
 		}
 	}
--- a/pkg/xsync/map.go
+++ b/pkg/xsync/map.go
@ -15,6 +15,12 @@ func NewSyncedMap[K comparable, V any]() *SyncedMap[K, V] {
 	}
 }

+func (m *SyncedMap[K, V]) Map() map[K]V {
+	m.mu.RLock()
+	defer m.mu.RUnlock()
+	return m.m
+}
+
 func (m *SyncedMap[K, V]) Get(key K) V {
 	m.mu.RLock()
 	defer m.mu.RUnlock()
--- a/pkg/xsysinfo/cpu.go
+++ b/pkg/xsysinfo/cpu.go
@ -0,0 +1,38 @@
+package xsysinfo
+
+import (
+	"sort"
+
+	"github.com/jaypipes/ghw"
+	"github.com/klauspost/cpuid/v2"
+)
+
+func CPUCapabilities() ([]string, error) {
+	cpu, err := ghw.CPU()
+	if err != nil {
+		return nil, err
+	}
+
+	caps := map[string]struct{}{}
+
+	for _, proc := range cpu.Processors {
+		for _, c := range proc.Capabilities {
+
+			caps[c] = struct{}{}
+		}
+
+	}
+
+	ret := []string{}
+	for c := range caps {
+		ret = append(ret, c)
+	}
+
+	// order
+	sort.Strings(ret)
+	return ret, nil
+}
+
+func HasCPUCaps(ids ...cpuid.FeatureID) bool {
+	return cpuid.CPU.Supports(ids...)
+}
--- a/pkg/xsysinfo/gpu.go
+++ b/pkg/xsysinfo/gpu.go
@ -0,0 +1,15 @@
+package xsysinfo
+
+import (
+	"github.com/jaypipes/ghw"
+	"github.com/jaypipes/ghw/pkg/gpu"
+)
+
+func GPUs() ([]*gpu.GraphicsCard, error) {
+	gpu, err := ghw.GPU()
+	if err != nil {
+		return nil, err
+	}
+
+	return gpu.GraphicsCards, nil
+}
Author	SHA1	Message	Date
Richard Palethorpe	ae61a307c6	Merge `49df11b4e8` into `6559ac11b1`	2024-05-08 12:34:22 +02:00
Ettore Di Giacinto	6559ac11b1	feat(ui): prompt for chat, support vision, enhancements (#2259 ) * feat(ui): allow to set system prompt for chat Make also the models in the index clickable, and display as table Fixes #2257 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(vision): support also png with base64 input Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): support vision and upload of files Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * display the processed image Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make trust remote code stand out Signed-off-by: mudler <mudler@localai.io> * feat(ui): track in progress job across index/model gallery Signed-off-by: mudler <mudler@localai.io> * minor fixups Signed-off-by: mudler <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: mudler <mudler@localai.io>	2024-05-08 00:42:34 +02:00
Ettore Di Giacinto	02ec546dd6	models(gallery): Add Soliloquy (#2260 ) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-08 00:14:19 +02:00
LocalAI [bot]	995aa5ed21	⬆️ Update ggerganov/llama.cpp (#2263 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-07 21:39:12 +00:00
Michael Mior	e28ba4b807	Add missing Homebrew dependencies (#2256 ) Signed-off-by: Michael Mior <michael.mior@gmail.com> Signed-off-by: Michael Mior <mmior@mail.rit.edu>	2024-05-07 16:34:30 +00:00
Daniel	d1e3436de5	Update readme: add ShellOracle to community integrations (#2254 ) Signed-off-by: Daniel Copley <djcopley@users.noreply.github.com>	2024-05-07 08:39:58 +02:00
Dave	d3ddc9e4aa	UI: flag `trust_remote_code` to users // favicon support (#2253 ) * attempt to indicate trust_remote_code in some way * bonus: favicon support! --------- Signed-off-by: Dave Lee <dave@gray101.com>	2024-05-07 08:39:23 +02:00
fakezeta	fea9522982	fix: OpenVINO winograd always disabled (#2252 ) Winograd convolutions were always disabled giving error when inference device was CPU. This commit implement logic to disable Winograd convolutions only if CPU or NPU are declared.	2024-05-07 08:38:58 +02:00
Ettore Di Giacinto	fe055d4b36	feat(webui): ux improvements (#2247 ) * ux: change welcome when there are no models installed Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ux: filter Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ux: show tags in filter Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * wip Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make tags clickable Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * allow to delete models from the list Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ui: display icon of installed models Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * gallery: remove gallery file when removing model Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): show a re-install button Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make filter buttons, rename Gallery field Signed-off-by: mudler <mudler@localai.io> * show again buttons at end of operations Signed-off-by: mudler <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: mudler <mudler@localai.io>	2024-05-07 01:17:07 +02:00
LocalAI [bot]	581b894789	⬆️ Update ggerganov/llama.cpp (#2255 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-06 21:28:07 +00:00
Ettore Di Giacinto	477655f6e6	models(gallery): average_norrmie reupload Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-06 19:56:24 +02:00
fakezeta	169d8d21ff	gallery: Added some OpenVINO models (#2249 ) * Added some OpenVINO models Added Phi-3 trust_remote_code: true Added Hermes 2 Pro Llama3 Added Multilingual-E5-base embedding model with OpenVINO acceleration (CPU and XPU) Added all-MiniLM-L6-v2 with OpenVINO acceleration (CPU and XPU) * Added Remote Code for phi, fixed error on Yamllint * update openvino.yaml I need to go to rest: today is not my day...	2024-05-06 10:52:05 +02:00
LocalAI [bot]	c5475020fe	⬆️ Update ggerganov/llama.cpp (#2251 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-05 21:16:00 +00:00
Dave	b52ff1249f	test: check the response URL during image gen in `app_test.go` (#2248 ) test: actually check the response URL from image gen Signed-off-by: Dave Lee <dave@gray101.com>	2024-05-05 18:46:33 +00:00
Ettore Di Giacinto	c5798500cb	feat(single-build): generate single binaries for releases (#2246 ) * feat(single-build): generate single binaries for releases Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * drop old targets Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-05 17:20:51 +02:00
Ettore Di Giacinto	67ad3532ec	Update README.md Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-05 15:45:55 +02:00
Ettore Di Giacinto	5cb96fe7df	models(gallery): add openbiollm (#2245 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-05 15:19:46 +02:00
Ettore Di Giacinto	810e8e5855	models(gallery): add lumimaid (#2244 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-05 15:19:33 +02:00
Ettore Di Giacinto	f3bcc648e7	models(gallery): add icon for instruct-coder Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-05 12:20:06 +02:00
Ettore Di Giacinto	3096566333	models(gallery): poppy porpoise fix correct mmproj URL Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-05 11:56:07 +02:00
Ettore Di Giacinto	f50c6a4e88	models(gallery): update poppy porpoise (#2243 ) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-05 11:19:09 +02:00
Ettore Di Giacinto	ab4ee54855	models(gallery): add llama3-instruct-coder (#2242 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-05 11:18:50 +02:00
Ettore Di Giacinto	f2d35062d4	models(gallery): moondream2 fixups Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-05 10:49:04 +02:00
Ettore Di Giacinto	b69ff46c7e	feat(startup): show CPU/GPU information with --debug (#2241 ) Signed-off-by: mudler <mudler@localai.io>	2024-05-05 09:10:23 +02:00
Ettore Di Giacinto	117c9873e1	fix(webui): display small navbar with smaller screens (#2240 ) Signed-off-by: mudler <mudler@localai.io>	2024-05-04 23:38:39 +02:00
LocalAI [bot]	17e94fbcb1	⬆️ Update ggerganov/llama.cpp (#2239 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-04 21:26:22 +00:00
Ettore Di Giacinto	92f7feb874	models(gallery): add llama3-llava (#2238 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-04 22:43:11 +02:00
Ettore Di Giacinto	b70e2bffa3	models(gallery): add moondream2 (#2237 ) * models(gallery): add moondream2 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * models(gallery): fix typo for TTS models Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * models(gallery): add base config for moondream2 and icon Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * linter fixes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-04 18:29:04 +02:00
nold	06c43ca285	fix(gallery): hermes-2-pro-llama3 models checksum changed (#2236 ) fix(gallery): hermes-2-pro-llama3 models checksum Signed-off-by: Gerrit Pannek <nold@gnu.one>	2024-05-04 17:59:54 +02:00
Ettore Di Giacinto	530bec9c64	feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants (#2232 ) * feat(initializer): do not specify backends to autoload We can simply try to autoload the backends extracted in the asset dir. This will allow to build variants of the same backend (for e.g. with different instructions sets), so to have a single binary for all the variants. Signed-off-by: mudler <mudler@localai.io> * refactor(prepare): refactor out llama.cpp prepare steps Make it so are idempotent and that we can re-build Signed-off-by: mudler <mudler@localai.io> * [TEST] feat(build): build noavx version along Signed-off-by: mudler <mudler@localai.io> * build: make build parallel Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build: do not override CMAKE_ARGS Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build: add fallback variant Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(huggingface-langchain): fail if no token is set Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(huggingface-langchain): rename Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: do not autoload local-store Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: give priority between the listed backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: mudler <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-04 17:56:12 +02:00
fakezeta	fa10302dd2	docs: updated Transformer parameters description (#2234 ) updated Transformer parameters	2024-05-04 10:45:25 +02:00
Ettore Di Giacinto	54faaa87ea	fix(webui): correct documentation URL for text2img (#2233 ) Signed-off-by: mudler <mudler@localai.io> Co-authored-by: Dave <dave@gray101.com>	2024-05-04 00:25:13 +00:00
dependabot[bot]	daba8a85f9	build(deps): bump tqdm from 4.65.0 to 4.66.3 in /examples/langchain/langchainpy-localai-example in the pip group across 1 directory (#2231 ) build(deps): bump tqdm Bumps the pip group with 1 update in the /examples/langchain/langchainpy-localai-example directory: [tqdm](https://github.com/tqdm/tqdm). Updates `tqdm` from 4.65.0 to 4.66.3 - [Release notes](https://github.com/tqdm/tqdm/releases) - [Commits](https://github.com/tqdm/tqdm/compare/v4.65.0...v4.66.3) --- updated-dependencies: - dependency-name: tqdm dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-03 23:15:06 +00:00
LocalAI [bot]	ac0f3d6e82	⬆️ Update ggerganov/whisper.cpp (#2230 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-03 22:16:26 +00:00
LocalAI [bot]	da0b6a89ae	⬆️ Update ggerganov/llama.cpp (#2229 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-03 21:39:28 +00:00
LocalAI [bot]	929a68c06d	⬆️ Update docs version mudler/LocalAI (#2228 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-03 21:18:11 +00:00
cryptk	a0aa5d01a1	feat: update ROCM and use smaller image (#2196 ) * feat: update ROCM and use smaller image Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> * fix: add call to ldconfig to fix AMDs broken library packages Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com> --------- Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>	2024-05-03 18:46:49 +02:00
Ettore Di Giacinto	dc834cc9d2	Update README.md Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-03 09:42:35 +02:00
Ettore Di Giacinto	b58274b8a2	feat(ui): support multilineand style `ul` (#2226 ) * feat(ui/chat): handle multiline in the input field Signed-off-by: mudler <mudler@localai.io> * feat(ui/chat): correctly display multiline messages Signed-off-by: mudler <mudler@localai.io> * feat(ui/chat): add list style Signed-off-by: mudler <mudler@localai.io> --------- Signed-off-by: mudler <mudler@localai.io>	2024-05-03 00:43:02 +02:00
Ettore Di Giacinto	a31d00d904	feat(aio): switch to llama3-based for LLM (#2225 ) Signed-off-by: mudler <mudler@localai.io>	2024-05-03 00:41:45 +02:00
LocalAI [bot]	2cc1bd85af	⬆️ Update ggerganov/llama.cpp (#2224 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-02 21:23:40 +00:00
Ettore Di Giacinto	2c5a46bc34	feat(ux): Add chat, tts, and image-gen pages to the WebUI (#2222 ) * feat(webui): Add chat page Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(webui): Add image-gen page Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(webui): Add tts page Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-02 21:14:10 +02:00
Ettore Di Giacinto	f7f8b4804b	models(gallery): Add Hermes-2-Pro-Llama-3-8B-GGUF (#2218 ) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-02 18:31:13 +02:00
Ettore Di Giacinto	e5bd9a76c7	models(gallery): add wizardlm2 (#2209 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-05-02 18:31:02 +02:00
fakezeta	4690b534e0	feat: user defined inference device for CUDA and OpenVINO (#2212 ) user defined inference device configuration via main_gpu parameter	2024-05-02 09:54:29 +02:00
LocalAI [bot]	6a7a7996bb	⬆️ Update ggerganov/llama.cpp (#2213 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2024-05-01 21:19:44 +00:00
Ettore Di Giacinto	962ebbaf77	models(gallery): fixup phi-3 sha Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2024-05-01 23:06:58 +02:00
Richard Palethorpe	49df11b4e8	Replace own vector search with Chromem-go Because my implementation is apparently not producing the expected results and this project fits the bill quite nicely. Most of the tests are failing, but the core functionality appears to work. - [x] Set - [ ] Get - [ ] Delete (we have to delete the whole collection presently) - [ ] Query (it returns normalized embeddings, not the originals) - [x] Query Normalized Some features are missing from Chromem, but we don't strictly need them. Meanwhile there are maybe some minor things that need adding on our side to get the basic functionality working.	2024-04-18 10:16:41 +01:00