feat(conda): conda environments (#1144)

* feat(autogptq): add a separate conda environment for autogptq (#1137) **Description** This PR related to #1117 **Notes for Reviewers** Here we lock down the version of the dependencies. Make sure it can be used all the time without failed if the version of dependencies were upgraded. I change the order of importing packages according to the pylint, and no change the logic of code. It should be ok. I will do more investigate on writing some test cases for every backend. I can run the service in my environment, but there is not exist a way to test it. So, I am not confident on it. Add a README.md in the `grpc` root. This is the common commands for creating `conda` environment. And it can be used to the reference file for creating extral gRPC backend document. Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * [Extra backend] Add seperate environment for ttsbark (#1141) **Description** This PR relates to #1117 **Notes for Reviewers** Same to the latest PR: * The code is also changed, but only the order of the import package parts. And some code comments are also added. * Add a configuration of the `conda` environment * Add a simple test case for testing if the service can be startup in current `conda` environment. It is succeed in VSCode, but the it is not out of box on terminal. So, it is hard to say the test case really useful. **[Signed commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)** - [x] Yes, I signed my commits.  Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(conda): add make target and entrypoints for the dockerfile Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(conda): Add seperate conda env for diffusers (#1145) **Description** This PR relates to #1117 **Notes for Reviewers** * Add `conda` env `diffusers.yml` * Add Makefile to create it automatically * Add `run.sh` to support running as a extra backend * Also adding it to the main Dockerfile * Add make command in the root Makefile * Testing the server, it can start up under the env Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(conda):Add seperate env for vllm (#1148) **Description** This PR is related to #1117 **Notes for Reviewers** * The gRPC server can be started as normal * The test case can be triggered in VSCode * Same to other this kind of PRs, add `vllm.yml` Makefile and add `run.sh` to the main Dockerfile, and command to the main Makefile **[Signed commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)** - [x] Yes, I signed my commits.  Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(conda):Add seperate env for huggingface (#1146) **Description** This PR is related to #1117 **Notes for Reviewers** * Add conda env `huggingface.yml` * Change the import order, and also remove the no-used packages * Add `run.sh` and `make command` to the main Dockerfile and Makefile * Add test cases for it. It can be triggered and succeed under VSCode Python extension but it is hang by using `python -m unites test_huggingface.py` in the terminal ``` Running tests (unittest): /workspaces/LocalAI/extra/grpc/huggingface Running tests: /workspaces/LocalAI/extra/grpc/huggingface/test_huggingface.py::TestBackendServicer::test_embedding /workspaces/LocalAI/extra/grpc/huggingface/test_huggingface.py::TestBackendServicer::test_load_model /workspaces/LocalAI/extra/grpc/huggingface/test_huggingface.py::TestBackendServicer::test_server_startup ./test_huggingface.py::TestBackendServicer::test_embedding Passed ./test_huggingface.py::TestBackendServicer::test_load_model Passed ./test_huggingface.py::TestBackendServicer::test_server_startup Passed Total number of tests expected to run: 3 Total number of tests run: 3 Total number of tests passed: 3 Total number of tests failed: 0 Total number of tests failed with errors: 0 Total number of tests skipped: 0 Finished running tests! ``` **[Signed commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)** - [x] Yes, I signed my commits.  Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(conda): Add the seperate conda env for VALL-E X (#1147) **Description** This PR is related to #1117 **Notes for Reviewers** * The gRPC server cannot start up ``` (ttsvalle) @Aisuko ➜ /workspaces/LocalAI (feat/vall-e-x) $ /opt/conda/envs/ttsvalle/bin/python /workspaces/LocalAI/extra/grpc/vall-e-x/ttsvalle.py Traceback (most recent call last): File "/workspaces/LocalAI/extra/grpc/vall-e-x/ttsvalle.py", line 14, in <module> from utils.generation import SAMPLE_RATE, generate_audio, preload_models ModuleNotFoundError: No module named 'utils' ``` The installation steps follow https://github.com/Plachtaa/VALL-E-X#-installation below: * Under the `ttsvalle` conda env ``` git clone https://github.com/Plachtaa/VALL-E-X.git cd VALL-E-X pip install -r requirements.txt ``` **[Signed commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)** - [x] Yes, I signed my commits.  Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: set image type Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(conda):Add seperate conda env for exllama (#1149) Add seperate env for exllama Signed-off-by: Aisuko <urakiny@gmail.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Setup conda Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Set image_type arg Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: prepare only conda env in tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Dockerfile: comment manual pip calls Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * conda: add conda to PATH Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixes * add shebang * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * file perms Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * debug * Install new conda in the worker * Disable GPU tests for now until the worker is back * Rename workflows * debug * Fixup conda install * fixup(wrapper): pass args Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Aisuko <urakiny@gmail.com> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Aisuko <urakiny@gmail.com>
2024-06-07 19:40:48 +00:00 · 2023-11-04 15:30:32 +01:00 · 2023-11-04 15:30:32 +01:00 · f347e51927
commit f347e51927
parent 9b17af18b3
44 changed files with 1288 additions and 163 deletions
--- a/.github/workflows/disabled/test-gpu.yml
+++ b/.github/workflows/disabled/test-gpu.yml
--- a/.github/workflows/image.yml
+++ b/.github/workflows/image.yml
@ -14,7 +14,7 @@ concurrency:
  cancel-in-progress: true

 jobs:
-  docker:
+  image-build:
    strategy:
      matrix:
        include:
@ -29,98 +29,6 @@ jobs:
            tag-latest: 'false'
            tag-suffix: '-ffmpeg'
            ffmpeg: 'true'
-
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-      - name: Release space from worker
-        run: |
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          df -h
-          echo
-          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
-          sudo apt-get remove --auto-remove android-sdk-platform-tools || true
-          sudo apt-get purge --auto-remove android-sdk-platform-tools || true
-          sudo rm -rf /usr/local/lib/android
-          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
-          sudo rm -rf /usr/share/dotnet
-          sudo apt-get remove -y '^mono-.*' || true
-          sudo apt-get remove -y '^ghc-.*' || true
-          sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
-          sudo apt-get remove -y 'php.*' || true
-          sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
-          sudo apt-get remove -y '^google-.*' || true
-          sudo apt-get remove -y azure-cli || true
-          sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
-          sudo apt-get remove -y '^gfortran-.*' || true
-          sudo apt-get remove -y microsoft-edge-stable || true
-          sudo apt-get remove -y firefox || true
-          sudo apt-get remove -y powershell || true
-          sudo apt-get remove -y r-base-core || true
-          sudo apt-get autoremove -y
-          sudo apt-get clean
-          echo
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          sudo rm -rfv build || true
-          df -h
-      - name: Docker meta
-        id: meta
-        uses: docker/metadata-action@v5
-        with:
-          images: quay.io/go-skynet/local-ai
-          tags: |
-            type=ref,event=branch
-            type=semver,pattern={{raw}}
-            type=sha
-          flavor: |
-            latest=${{ matrix.tag-latest }}
-            suffix=${{ matrix.tag-suffix }}
-
-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@master
-        with:
-          platforms: all
-
-      - name: Set up Docker Buildx
-        id: buildx
-        uses: docker/setup-buildx-action@master
-
-      - name: Login to DockerHub
-        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v3
-        with:
-          registry: quay.io
-          username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-          password: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-
-      - name: Build and push
-        uses: docker/build-push-action@v5
-        with:
-          builder: ${{ steps.buildx.outputs.name }}
-          build-args: |
-            BUILD_TYPE=${{ matrix.build-type }}
-            CUDA_MAJOR_VERSION=${{ matrix.cuda-major-version }}
-            CUDA_MINOR_VERSION=${{ matrix.cuda-minor-version }}
-            FFMPEG=${{ matrix.ffmpeg }}
-          context: .
-          file: ./Dockerfile
-          platforms: ${{ matrix.platforms }}
-          push: ${{ github.event_name != 'pull_request' }}
-          tags: ${{ steps.meta.outputs.tags }}
-          labels: ${{ steps.meta.outputs.labels }}
-
-
-  docker-gpu:
-    strategy:
-      matrix:
-        include:
          - build-type: 'cublas'
            cuda-major-version: 11
            cuda-minor-version: 7
@ -162,7 +70,42 @@ jobs:
          && sudo apt-get install -y git
      - name: Checkout
        uses: actions/checkout@v4
-
+      # - name: Release space from worker
+      #   run: |
+      #     echo "Listing top largest packages"
+      #     pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
+      #     head -n 30 <<< "${pkgs}"
+      #     echo
+      #     df -h
+      #     echo
+      #     sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
+      #     sudo apt-get remove --auto-remove android-sdk-platform-tools || true
+      #     sudo apt-get purge --auto-remove android-sdk-platform-tools || true
+      #     sudo rm -rf /usr/local/lib/android
+      #     sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
+      #     sudo rm -rf /usr/share/dotnet
+      #     sudo apt-get remove -y '^mono-.*' || true
+      #     sudo apt-get remove -y '^ghc-.*' || true
+      #     sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
+      #     sudo apt-get remove -y 'php.*' || true
+      #     sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
+      #     sudo apt-get remove -y '^google-.*' || true
+      #     sudo apt-get remove -y azure-cli || true
+      #     sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
+      #     sudo apt-get remove -y '^gfortran-.*' || true
+      #     sudo apt-get remove -y microsoft-edge-stable || true
+      #     sudo apt-get remove -y firefox || true
+      #     sudo apt-get remove -y powershell || true
+      #     sudo apt-get remove -y r-base-core || true
+      #     sudo apt-get autoremove -y
+      #     sudo apt-get clean
+      #     echo
+      #     echo "Listing top largest packages"
+      #     pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
+      #     head -n 30 <<< "${pkgs}"
+      #     echo
+      #     sudo rm -rfv build || true
+      #     df -h
      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v5
@ -192,6 +135,7 @@ jobs:
          registry: quay.io
          username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
          password: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
@ -207,7 +151,3 @@ jobs:
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
-      - name: Release space from worker ♻
-        if: always()
-        run: |
-          docker system prune -f -a --volumes || true
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@ -14,7 +14,7 @@ concurrency:
  cancel-in-progress: true

 jobs:
-  ubuntu-latest:
+  tests-linux:
    runs-on: ubuntu-latest
    strategy:
      matrix:
@ -67,11 +67,18 @@ jobs:
        run: |
          sudo apt-get update
          sudo apt-get install build-essential ffmpeg
-
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+              gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
          sudo apt-get install -y ca-certificates cmake curl patch
          sudo apt-get install -y libopencv-dev && sudo ln -s /usr/include/opencv4/opencv2 /usr/include/opencv2
-          sudo pip install -r extra/requirements.txt
          
+          sudo rm -rfv /usr/bin/conda || true
+          PATH=$PATH:/opt/conda/bin make -C extra/grpc/huggingface

          # Pre-build stable diffusion before we install a newever version of abseil (not compatible with stablediffusion-ncn)
          GO_TAGS="tts stablediffusion" GRPC_BACKENDS=backend-assets/grpc/stablediffusion make build
@ -96,12 +103,11 @@ jobs:
              cd grpc && mkdir -p cmake/build && cd cmake/build && cmake -DgRPC_INSTALL=ON \
                -DgRPC_BUILD_TESTS=OFF \
                ../.. && sudo make -j12 install
-
      - name: Test
        run: |
          ESPEAK_DATA="/build/lib/Linux-$(uname -m)/piper_phonemize/lib/espeak-ng-data" GO_TAGS="tts stablediffusion" make test

-  macOS-latest:
+  tests-apple:
    runs-on: macOS-latest
    strategy:
      matrix:
--- a/29
+++ b/29
@ -14,7 +14,7 @@ ARG TARGETARCH
 ARG TARGETVARIANT

 ENV BUILD_TYPE=${BUILD_TYPE}
-ENV EXTERNAL_GRPC_BACKENDS="huggingface-embeddings:/build/extra/grpc/huggingface/huggingface.py,autogptq:/build/extra/grpc/autogptq/autogptq.py,bark:/build/extra/grpc/bark/ttsbark.py,diffusers:/build/extra/grpc/diffusers/backend_diffusers.py,exllama:/build/extra/grpc/exllama/exllama.py,vall-e-x:/build/extra/grpc/vall-e-x/ttsvalle.py,vllm:/build/extra/grpc/vllm/backend_vllm.py"
+ENV EXTERNAL_GRPC_BACKENDS="huggingface-embeddings:/build/extra/grpc/huggingface/run.sh,autogptq:/build/extra/grpc/autogptq/run.sh,bark:/build/extra/grpc/bark/run.sh,diffusers:/build/extra/grpc/diffusers/run.sh,exllama:/build/extra/grpc/exllama/run.sh,vall-e-x:/build/extra/grpc/vall-e-x/run.sh,vllm:/build/extra/grpc/vllm/run.sh"
 ENV GALLERIES='[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]'
 ARG GO_TAGS="stablediffusion tts"

@ -77,17 +77,25 @@ RUN curl -L "https://github.com/gabime/spdlog/archive/refs/tags/v${SPDLOG_VERSIO
 # Extras requirements
 FROM requirements-core as requirements-extras

+RUN curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+    install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+    gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list && \
+    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list && \
+    apt-get update && \
+    apt-get install -y conda
+
 COPY extra/requirements.txt /build/extra/requirements.txt
 ENV PATH="/root/.cargo/bin:${PATH}"
 RUN pip install --upgrade pip
 RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
-RUN if [ "${TARGETARCH}" = "amd64" ]; then \
-        pip install git+https://github.com/suno-ai/bark.git diffusers invisible_watermark transformers accelerate safetensors;\
-    fi
-RUN if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "amd64" ]; then \
-        pip install torch vllm && pip install auto-gptq https://github.com/jllllll/exllama/releases/download/0.0.10/exllama-0.0.10+cu${CUDA_MAJOR_VERSION}${CUDA_MINOR_VERSION}-cp39-cp39-linux_x86_64.whl;\
-    fi
-RUN pip install -r /build/extra/requirements.txt && rm -rf /build/extra/requirements.txt
+#RUN if [ "${TARGETARCH}" = "amd64" ]; then \
+#        pip install git+https://github.com/suno-ai/bark.git diffusers invisible_watermark transformers accelerate safetensors;\
+#    fi
+#RUN if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "amd64" ]; then \
+#        pip install torch vllm && pip install auto-gptq https://github.com/jllllll/exllama/releases/download/0.0.10/exllama-0.0.10+cu${CUDA_MAJOR_VERSION}${CUDA_MINOR_VERSION}-cp39-cp39-linux_x86_64.whl;\
+ #   fi
+#RUN pip install -r /build/extra/requirements.txt && rm -rf /build/extra/requirements.txt

 # Vall-e-X
 RUN git clone https://github.com/Plachtaa/VALL-E-X.git /usr/lib/vall-e-x && cd /usr/lib/vall-e-x && pip install -r requirements.txt
@ -139,6 +147,7 @@ FROM requirements-${IMAGE_TYPE}
 ARG FFMPEG
 ARG BUILD_TYPE
 ARG TARGETARCH
+ARG IMAGE_TYPE=extras

 ENV BUILD_TYPE=${BUILD_TYPE}
 ENV REBUILD=false
@ -169,6 +178,10 @@ COPY --from=builder /build/local-ai ./
 # do not let stablediffusion rebuild (requires an older version of absl)
 COPY --from=builder /build/backend-assets/grpc/stablediffusion ./backend-assets/grpc/stablediffusion

+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    PATH=$PATH:/opt/conda/bin make prepare-extra-conda-environments \
+    ; fi
+
 # Copy VALLE-X as it's not a real "lib"
 RUN if [ -d /usr/lib/vall-e-x ]; then \
    cp -rfv /usr/lib/vall-e-x/* ./ ; \ 
--- a/26
+++ b/26
@ -290,12 +290,12 @@ run: prepare ## run local-ai
 test-models/testmodel:
 	mkdir test-models
 	mkdir test-dir
-	wget https://huggingface.co/nnakasato/ggml-model-test/resolve/main/ggml-model-q4.bin -O test-models/testmodel
-	wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin -O test-models/whisper-en
-	wget https://huggingface.co/mudler/all-MiniLM-L6-v2/resolve/main/ggml-model-q4_0.bin -O test-models/bert
-	wget https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav -O test-dir/audio.wav
-	wget https://huggingface.co/mudler/rwkv-4-raven-1.5B-ggml/resolve/main/RWKV-4-Raven-1B5-v11-Eng99%2525-Other1%2525-20230425-ctx4096_Q4_0.bin -O test-models/rwkv
-	wget https://raw.githubusercontent.com/saharNooby/rwkv.cpp/5eb8f09c146ea8124633ab041d9ea0b1f1db4459/rwkv/20B_tokenizer.json -O test-models/rwkv.tokenizer.json
+	wget -q https://huggingface.co/nnakasato/ggml-model-test/resolve/main/ggml-model-q4.bin -O test-models/testmodel
+	wget -q https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin -O test-models/whisper-en
+	wget -q https://huggingface.co/mudler/all-MiniLM-L6-v2/resolve/main/ggml-model-q4_0.bin -O test-models/bert
+	wget -q https://cdn.openai.com/whisper/draft-20220913a/micro-machines.wav -O test-dir/audio.wav
+	wget -q https://huggingface.co/mudler/rwkv-4-raven-1.5B-ggml/resolve/main/RWKV-4-Raven-1B5-v11-Eng99%2525-Other1%2525-20230425-ctx4096_Q4_0.bin -O test-models/rwkv
+	wget -q https://raw.githubusercontent.com/saharNooby/rwkv.cpp/5eb8f09c146ea8124633ab041d9ea0b1f1db4459/rwkv/20B_tokenizer.json -O test-models/rwkv.tokenizer.json
 	cp tests/models_fixtures/* test-models

 prepare-test: grpcs
@ -306,8 +306,8 @@ test: prepare test-models/testmodel grpcs
 	@echo 'Running tests'
 	export GO_TAGS="tts stablediffusion"
 	$(MAKE) prepare-test
-	HUGGINGFACE_GRPC=$(abspath ./)/extra/grpc/huggingface/huggingface.py TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
-	$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!gpt4all && !llama && !llama-gguf" --flake-attempts 5 -v -r ./api ./pkg
+	HUGGINGFACE_GRPC=$(abspath ./)/extra/grpc/huggingface/run.sh TEST_DIR=$(abspath ./)/test-dir/ FIXTURES=$(abspath ./)/tests/fixtures CONFIG_FILE=$(abspath ./)/test-models/config.yaml MODELS_PATH=$(abspath ./)/test-models \
+	$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --label-filter="!gpt4all && !llama && !llama-gguf" --fail-fast -v -r ./api ./pkg
 	$(MAKE) test-gpt4all
 	$(MAKE) test-llama
 	$(MAKE) test-llama-gguf
@ -387,6 +387,16 @@ protogen-python:

 ## GRPC

+prepare-extra-conda-environments:
+	$(MAKE) -C extra/grpc/autogptq
+	$(MAKE) -C extra/grpc/bark
+	$(MAKE) -C extra/grpc/diffusers
+	$(MAKE) -C extra/grpc/vllm
+	$(MAKE) -C extra/grpc/huggingface
+	$(MAKE) -C extra/grpc/vall-e-x
+	$(MAKE) -C extra/grpc/exllama
+
+
 backend-assets/grpc:
 	mkdir -p backend-assets/grpc

--- a/extra/grpc/README.md
+++ b/extra/grpc/README.md
@ -0,0 +1,38 @@
+# Common commands about conda environment
+
+## Create a new empty conda environment
+
+```
+conda create --name <env-name> python=<your version> -y
+
+conda create --name autogptq python=3.11 -y
+```
+
+## To activate the environment
+
+As of conda 4.4
+```
+conda activate autogptq
+```
+
+The conda version older than 4.4
+
+```
+source activate autogptq
+```
+
+## Install the packages to your environment
+
+Sometimes you need to install the packages from the conda-forge channel
+
+By using `conda`
+```
+conda install <your-package-name>
+
+conda install -c conda-forge <your package-name>
+```
+
+Or by using `pip`
+```
+pip install <your-package-name>
+```
--- a/extra/grpc/autogptq/Makefile
+++ b/extra/grpc/autogptq/Makefile
@ -0,0 +1,5 @@
+.PONY: autogptq
+autogptq:
+	@echo "Creating virtual environment..."
+	@conda env create --name autogptq --file autogptq.yml
+	@echo "Virtual environment created."
--- a/extra/grpc/autogptq/README.md
+++ b/extra/grpc/autogptq/README.md
@ -0,0 +1,5 @@
+# Creating a separate environment for the autogptq project
+
+```
+make autogptq
+```
--- a/extra/grpc/autogptq/autogptq.py
+++ b/extra/grpc/autogptq/autogptq.py
@ -1,15 +1,15 @@
 #!/usr/bin/env python3
-import grpc
 from concurrent import futures
-import time
-import backend_pb2
-import backend_pb2_grpc
 import argparse
 import signal
 import sys
 import os
-from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
-from pathlib import Path
+import time
+
+import grpc
+import backend_pb2
+import backend_pb2_grpc
+from auto_gptq import AutoGPTQForCausalLM
 from transformers import AutoTokenizer
 from transformers import TextGenerationPipeline

--- a/extra/grpc/autogptq/autogptq.yml
+++ b/extra/grpc/autogptq/autogptq.yml
@ -0,0 +1,86 @@
+name: autogptq
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.11=h7f8727e_2
+  - pip=23.2.1=py311h06a4308_0
+  - python=3.11.5=h955ad1f_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.0.0=py311h06a4308_0
+  - sqlite=3.41.2=h5eee18b_0
+  - tk=8.6.12=h1ccaba5_0
+  - wheel=0.41.2=py311h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - accelerate==0.23.0
+      - aiohttp==3.8.5
+      - aiosignal==1.3.1
+      - async-timeout==4.0.3
+      - attrs==23.1.0
+      - auto-gptq==0.4.2
+      - certifi==2023.7.22
+      - charset-normalizer==3.3.0
+      - datasets==2.14.5
+      - dill==0.3.7
+      - filelock==3.12.4
+      - frozenlist==1.4.0
+      - fsspec==2023.6.0
+      - grpcio==1.59.0
+      - huggingface-hub==0.16.4
+      - idna==3.4
+      - jinja2==3.1.2
+      - markupsafe==2.1.3
+      - mpmath==1.3.0
+      - multidict==6.0.4
+      - multiprocess==0.70.15
+      - networkx==3.1
+      - numpy==1.26.0
+      - nvidia-cublas-cu12==12.1.3.1
+      - nvidia-cuda-cupti-cu12==12.1.105
+      - nvidia-cuda-nvrtc-cu12==12.1.105
+      - nvidia-cuda-runtime-cu12==12.1.105
+      - nvidia-cudnn-cu12==8.9.2.26
+      - nvidia-cufft-cu12==11.0.2.54
+      - nvidia-curand-cu12==10.3.2.106
+      - nvidia-cusolver-cu12==11.4.5.107
+      - nvidia-cusparse-cu12==12.1.0.106
+      - nvidia-nccl-cu12==2.18.1
+      - nvidia-nvjitlink-cu12==12.2.140
+      - nvidia-nvtx-cu12==12.1.105
+      - packaging==23.2
+      - pandas==2.1.1
+      - peft==0.5.0
+      - protobuf==4.24.4
+      - psutil==5.9.5
+      - pyarrow==13.0.0
+      - python-dateutil==2.8.2
+      - pytz==2023.3.post1
+      - pyyaml==6.0.1
+      - regex==2023.10.3
+      - requests==2.31.0
+      - rouge==1.0.1
+      - safetensors==0.3.3
+      - six==1.16.0
+      - sympy==1.12
+      - tokenizers==0.14.0
+      - torch==2.1.0
+      - tqdm==4.66.1
+      - transformers==4.34.0
+      - triton==2.1.0
+      - typing-extensions==4.8.0
+      - tzdata==2023.3
+      - urllib3==2.0.6
+      - xxhash==3.4.1
+      - yarl==1.9.2
--- a/extra/grpc/autogptq/run.sh
+++ b/extra/grpc/autogptq/run.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+
+##
+## A bash script wrapper that runs the autogptq server with conda
+
+export PATH=$PATH:/opt/conda/bin
+
+# Activate conda environment
+source activate autogptq
+
+# get the directory where the bash script is located
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+python $DIR/autogptq.py $@
--- a/extra/grpc/bark/Makefile
+++ b/extra/grpc/bark/Makefile
@ -0,0 +1,5 @@
+.PONY: ttsbark
+ttsbark:
+	@echo "Creating virtual environment..."
+	@conda env create --name ttsbark --file ttsbark.yml
+	@echo "Virtual environment created."
--- a/extra/grpc/bark/README.md
+++ b/extra/grpc/bark/README.md
@ -0,0 +1,16 @@
+# Creating a separate environment for ttsbark project
+
+```
+make ttsbark
+```
+
+# Testing the gRPC server
+
+```
+<The path of your python interpreter> -m unittest test_ttsbark.py
+```
+
+For example
+```
+/opt/conda/envs/bark/bin/python -m unittest extra/grpc/bark/test_ttsbark.py
+``````
--- a/extra/grpc/bark/run.sh
+++ b/extra/grpc/bark/run.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+
+##
+## A bash script wrapper that runs the ttsbark server with conda
+
+export PATH=$PATH:/opt/conda/bin
+
+# Activate conda environment
+source activate ttsbark
+
+# get the directory where the bash script is located
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+python $DIR/ttsbark.py $@
--- a/extra/grpc/bark/test_ttsbark.py
+++ b/extra/grpc/bark/test_ttsbark.py
@ -0,0 +1,32 @@
+import unittest
+import subprocess
+import time
+import backend_pb2
+import backend_pb2_grpc
+
+import grpc
+
+class TestBackendServicer(unittest.TestCase):
+    """
+    TestBackendServicer is the class that tests the gRPC service
+    """
+    def setUp(self):
+        self.service = subprocess.Popen(["python3", "ttsbark.py", "--addr", "localhost:50051"])
+
+    def tearDown(self) -> None:
+        self.service.terminate()
+        self.service.wait()
+
+    def test_server_startup(self):
+        time.sleep(2)
+        try:
+            self.setUp()
+            with grpc.insecure_channel("localhost:50051") as channel:
+                stub = backend_pb2_grpc.BackendStub(channel)
+                response = stub.Health(backend_pb2.HealthMessage())
+                self.assertEqual(response.message, b'OK')
+        except Exception as err:
+            print(err)
+            self.fail("Server failed to start")
+        finally:
+            self.tearDown()
--- a/extra/grpc/bark/ttsbark.py
+++ b/extra/grpc/bark/ttsbark.py
@ -1,18 +1,23 @@
+"""
+This is the extra gRPC server of LocalAI
+"""
+
 #!/usr/bin/env python3
-import grpc
 from concurrent import futures
 import time
-import backend_pb2
-import backend_pb2_grpc
 import argparse
 import signal
 import sys
 import os
-from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
-from pathlib import Path
-from bark import SAMPLE_RATE, generate_audio, preload_models
 from scipy.io.wavfile import write as write_wav

+import backend_pb2
+import backend_pb2_grpc
+from bark import SAMPLE_RATE, generate_audio, preload_models
+
+import grpc
+
+
 _ONE_DAY_IN_SECONDS = 60 * 60 * 24

 # If MAX_WORKERS are specified in the environment use it, otherwise default to 1
@ -20,6 +25,9 @@ MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))

 # Implement the BackendServicer class with the service methods
 class BackendServicer(backend_pb2_grpc.BackendServicer):
+    """
+    BackendServicer is the class that implements the gRPC service
+    """
    def Health(self, request, context):
        return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
    def LoadModel(self, request, context):
--- a/extra/grpc/bark/ttsbark.yml
+++ b/extra/grpc/bark/ttsbark.yml
@ -0,0 +1,96 @@
+name: bark
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.11=h7f8727e_2
+  - pip=23.2.1=py311h06a4308_0
+  - python=3.11.5=h955ad1f_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.0.0=py311h06a4308_0
+  - sqlite=3.41.2=h5eee18b_0
+  - tk=8.6.12=h1ccaba5_0
+  - wheel=0.41.2=py311h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - accelerate==0.23.0
+      - aiohttp==3.8.5
+      - aiosignal==1.3.1
+      - async-timeout==4.0.3
+      - attrs==23.1.0
+      - bark==0.1.5
+      - boto3==1.28.61
+      - botocore==1.31.61
+      - certifi==2023.7.22
+      - charset-normalizer==3.3.0
+      - datasets==2.14.5
+      - dill==0.3.7
+      - einops==0.7.0
+      - encodec==0.1.1
+      - filelock==3.12.4
+      - frozenlist==1.4.0
+      - fsspec==2023.6.0
+      - funcy==2.0
+      - grpcio==1.59.0
+      - huggingface-hub==0.16.4
+      - idna==3.4
+      - jinja2==3.1.2
+      - jmespath==1.0.1
+      - markupsafe==2.1.3
+      - mpmath==1.3.0
+      - multidict==6.0.4
+      - multiprocess==0.70.15
+      - networkx==3.1
+      - numpy==1.26.0
+      - nvidia-cublas-cu12==12.1.3.1
+      - nvidia-cuda-cupti-cu12==12.1.105
+      - nvidia-cuda-nvrtc-cu12==12.1.105
+      - nvidia-cuda-runtime-cu12==12.1.105
+      - nvidia-cudnn-cu12==8.9.2.26
+      - nvidia-cufft-cu12==11.0.2.54
+      - nvidia-curand-cu12==10.3.2.106
+      - nvidia-cusolver-cu12==11.4.5.107
+      - nvidia-cusparse-cu12==12.1.0.106
+      - nvidia-nccl-cu12==2.18.1
+      - nvidia-nvjitlink-cu12==12.2.140
+      - nvidia-nvtx-cu12==12.1.105
+      - packaging==23.2
+      - pandas==2.1.1
+      - peft==0.5.0
+      - protobuf==4.24.4
+      - psutil==5.9.5
+      - pyarrow==13.0.0
+      - python-dateutil==2.8.2
+      - pytz==2023.3.post1
+      - pyyaml==6.0.1
+      - regex==2023.10.3
+      - requests==2.31.0
+      - rouge==1.0.1
+      - s3transfer==0.7.0
+      - safetensors==0.3.3
+      - scipy==1.11.3
+      - six==1.16.0
+      - sympy==1.12
+      - tokenizers==0.14.0
+      - torch==2.1.0
+      - torchaudio==2.1.0
+      - tqdm==4.66.1
+      - transformers==4.34.0
+      - triton==2.1.0
+      - typing-extensions==4.8.0
+      - tzdata==2023.3
+      - urllib3==1.26.17
+      - xxhash==3.4.1
+      - yarl==1.9.2
+prefix: /opt/conda/envs/bark
--- a/extra/grpc/diffusers/Makefile
+++ b/extra/grpc/diffusers/Makefile
@ -0,0 +1,11 @@
+.PONY: diffusers
+diffusers:
+	@echo "Creating virtual environment..."
+	@conda env create --name diffusers --file diffusers.yml
+	@echo "Virtual environment created."
+
+.PONY: run
+run:
+	@echo "Running diffusers..."
+	bash run.sh
+	@echo "Diffusers run."
--- a/extra/grpc/diffusers/README.md
+++ b/extra/grpc/diffusers/README.md
@ -0,0 +1,5 @@
+# Creating a separate environment for the diffusers project
+
+```
+make diffusers
+```
--- a/extra/grpc/diffusers/backend_diffusers.py
+++ b/extra/grpc/diffusers/backend_diffusers.py
@ -1,27 +1,32 @@
 #!/usr/bin/env python3
-import grpc
 from concurrent import futures
-import time
-import backend_pb2
-import backend_pb2_grpc
+
 import argparse
+from collections import defaultdict
+from enum import Enum
 import signal
 import sys
+import time
 import os

-# import diffusers
-import torch
-from torch import autocast
-from diffusers import StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, EulerAncestralDiscreteScheduler
-from diffusers.pipelines.stable_diffusion import safety_checker
-from compel import Compel
 from PIL import Image
-from io import BytesIO
+import torch
+
+import backend_pb2
+import backend_pb2_grpc
+
+import grpc
+
+from diffusers import StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, EulerAncestralDiscreteScheduler
 from diffusers import StableDiffusionImg2ImgPipeline
+from diffusers.pipelines.stable_diffusion import safety_checker
+
+from compel import Compel
+
 from transformers import CLIPTextModel
-from enum import Enum
-from collections import defaultdict
 from safetensors.torch import load_file
+
+
 _ONE_DAY_IN_SECONDS = 60 * 60 * 24
 COMPEL=os.environ.get("COMPEL", "1") == "1"
 CLIPSKIP=os.environ.get("CLIPSKIP", "1") == "1"
--- a/extra/grpc/diffusers/diffusers.yml
+++ b/extra/grpc/diffusers/diffusers.yml
@ -0,0 +1,74 @@
+name: diffusers
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.11=h7f8727e_2
+  - pip=23.2.1=py311h06a4308_0
+  - python=3.11.5=h955ad1f_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.0.0=py311h06a4308_0
+  - sqlite=3.41.2=h5eee18b_0
+  - tk=8.6.12=h1ccaba5_0
+  - tzdata=2023c=h04d1e81_0
+  - wheel=0.41.2=py311h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - accelerate==0.23.0
+      - certifi==2023.7.22
+      - charset-normalizer==3.3.0
+      - compel==2.0.2
+      - diffusers==0.21.4
+      - filelock==3.12.4
+      - fsspec==2023.9.2
+      - grpcio==1.59.0
+      - huggingface-hub==0.17.3
+      - idna==3.4
+      - importlib-metadata==6.8.0
+      - jinja2==3.1.2
+      - markupsafe==2.1.3
+      - mpmath==1.3.0
+      - networkx==3.1
+      - numpy==1.26.0
+      - nvidia-cublas-cu12==12.1.3.1
+      - nvidia-cuda-cupti-cu12==12.1.105
+      - nvidia-cuda-nvrtc-cu12==12.1.105
+      - nvidia-cuda-runtime-cu12==12.1.105
+      - nvidia-cudnn-cu12==8.9.2.26
+      - nvidia-cufft-cu12==11.0.2.54
+      - nvidia-curand-cu12==10.3.2.106
+      - nvidia-cusolver-cu12==11.4.5.107
+      - nvidia-cusparse-cu12==12.1.0.106
+      - nvidia-nccl-cu12==2.18.1
+      - nvidia-nvjitlink-cu12==12.2.140
+      - nvidia-nvtx-cu12==12.1.105
+      - packaging==23.2
+      - pillow==10.0.1
+      - protobuf==4.24.4
+      - psutil==5.9.5
+      - pyparsing==3.1.1
+      - pyyaml==6.0.1
+      - regex==2023.10.3
+      - requests==2.31.0
+      - safetensors==0.4.0
+      - sympy==1.12
+      - tokenizers==0.14.1
+      - torch==2.1.0
+      - tqdm==4.66.1
+      - transformers==4.34.0
+      - triton==2.1.0
+      - typing-extensions==4.8.0
+      - urllib3==2.0.6
+      - zipp==3.17.0
+prefix: /opt/conda/envs/diffusers
--- a/extra/grpc/diffusers/run.sh
+++ b/extra/grpc/diffusers/run.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+
+##
+## A bash script wrapper that runs the diffusers server with conda
+
+export PATH=$PATH:/opt/conda/bin
+
+# Activate conda environment
+source activate diffusers
+
+# get the directory where the bash script is located
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+python $DIR/backend_diffusers.py $@
--- a/extra/grpc/exllama/Makefile
+++ b/extra/grpc/exllama/Makefile
@ -0,0 +1,11 @@
+.PONY: exllama
+exllama:
+	@echo "Creating virtual environment..."
+	@conda env create --name exllama --file exllama.yml
+	@echo "Virtual environment created."
+
+.PONY: run
+run:
+	@echo "Running exllama..."
+	bash run.sh
+	@echo "exllama run."
--- a/extra/grpc/exllama/README.md
+++ b/extra/grpc/exllama/README.md
@ -0,0 +1,5 @@
+# Creating a separate environment for the exllama project
+
+```
+make exllama
+```
--- a/extra/grpc/exllama/exllama.yml
+++ b/extra/grpc/exllama/exllama.yml
@ -0,0 +1,55 @@
+name: exllama
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.11=h7f8727e_2
+  - pip=23.2.1=py311h06a4308_0
+  - python=3.11.5=h955ad1f_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.0.0=py311h06a4308_0
+  - sqlite=3.41.2=h5eee18b_0
+  - tk=8.6.12=h1ccaba5_0
+  - tzdata=2023c=h04d1e81_0
+  - wheel=0.41.2=py311h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - filelock==3.12.4
+      - fsspec==2023.9.2
+      - grpcio==1.59.0
+      - jinja2==3.1.2
+      - markupsafe==2.1.3
+      - mpmath==1.3.0
+      - networkx==3.1
+      - ninja==1.11.1
+      - nvidia-cublas-cu12==12.1.3.1
+      - nvidia-cuda-cupti-cu12==12.1.105
+      - nvidia-cuda-nvrtc-cu12==12.1.105
+      - nvidia-cuda-runtime-cu12==12.1.105
+      - nvidia-cudnn-cu12==8.9.2.26
+      - nvidia-cufft-cu12==11.0.2.54
+      - nvidia-curand-cu12==10.3.2.106
+      - nvidia-cusolver-cu12==11.4.5.107
+      - nvidia-cusparse-cu12==12.1.0.106
+      - nvidia-nccl-cu12==2.18.1
+      - nvidia-nvjitlink-cu12==12.2.140
+      - nvidia-nvtx-cu12==12.1.105
+      - protobuf==4.24.4
+      - safetensors==0.3.2
+      - sentencepiece==0.1.99
+      - sympy==1.12
+      - torch==2.1.0
+      - triton==2.1.0
+      - typing-extensions==4.8.0
+prefix: /opt/conda/envs/exllama
--- a/extra/grpc/exllama/run.sh
+++ b/extra/grpc/exllama/run.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+
+##
+## A bash script wrapper that runs the exllama server with conda
+
+export PATH=$PATH:/opt/conda/bin
+
+# Activate conda environment
+source activate exllama
+
+# get the directory where the bash script is located
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+python $DIR/exllama.py $@
--- a/extra/grpc/huggingface/Makefile
+++ b/extra/grpc/huggingface/Makefile
@ -0,0 +1,18 @@
+.PONY: huggingface
+huggingface:
+	@echo "Creating virtual environment..."
+	@conda env create --name huggingface --file huggingface.yml
+	@echo "Virtual environment created."
+
+.PONY: run
+run:
+	@echo "Running huggingface..."
+	bash run.sh
+	@echo "huggingface run."
+
+# It is not working well by using command line. It only6 works with IDE like VSCode.
+.PONY: test
+test:
+	@echo "Testing huggingface..."
+	bash test.sh
+	@echo "huggingface tested."
--- a/extra/grpc/huggingface/README.md
+++ b/extra/grpc/huggingface/README.md
@ -0,0 +1,5 @@
+# Creating a separate environment for the huggingface project
+
+```
+make huggingface
+```
--- a/extra/grpc/huggingface/huggingface.py
+++ b/extra/grpc/huggingface/huggingface.py
@ -1,13 +1,20 @@
+"""
+Extra gRPC server for HuggingFace SentenceTransformer models.
+"""
 #!/usr/bin/env python3
-import grpc
 from concurrent import futures
-import time
-import backend_pb2
-import backend_pb2_grpc
+
 import argparse
 import signal
 import sys
 import os
+
+import time
+import backend_pb2
+import backend_pb2_grpc
+
+import grpc
+
 from sentence_transformers import SentenceTransformer

 _ONE_DAY_IN_SECONDS = 60 * 60 * 24
@ -17,18 +24,56 @@ MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))

 # Implement the BackendServicer class with the service methods
 class BackendServicer(backend_pb2_grpc.BackendServicer):
+    """
+    A gRPC servicer for the backend service.
+
+    This class implements the gRPC methods for the backend service, including Health, LoadModel, and Embedding.
+    """
    def Health(self, request, context):
+        """
+        A gRPC method that returns the health status of the backend service.
+
+        Args:
+            request: A HealthRequest object that contains the request parameters.
+            context: A grpc.ServicerContext object that provides information about the RPC.
+
+        Returns:
+            A Reply object that contains the health status of the backend service.
+        """
        return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
+
    def LoadModel(self, request, context):
+        """
+        A gRPC method that loads a model into memory.
+
+        Args:
+            request: A LoadModelRequest object that contains the request parameters.
+            context: A grpc.ServicerContext object that provides information about the RPC.
+
+        Returns:
+            A Result object that contains the result of the LoadModel operation.
+        """
        model_name = request.Model
        try:
            self.model = SentenceTransformer(model_name)
        except Exception as err:
            return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
+
        # Implement your logic here for the LoadModel service
        # Replace this with your desired response
        return backend_pb2.Result(message="Model loaded successfully", success=True)
+
    def Embedding(self, request, context):
+        """
+        A gRPC method that calculates embeddings for a given sentence.
+
+        Args:
+            request: An EmbeddingRequest object that contains the request parameters.
+            context: A grpc.ServicerContext object that provides information about the RPC.
+
+        Returns:
+            An EmbeddingResult object that contains the calculated embeddings.
+        """
        # Implement your logic here for the Embedding service
        # Replace this with your desired response
        print("Calculated embeddings for: " + request.Embeddings, file=sys.stderr)
--- a/extra/grpc/huggingface/huggingface.yml
+++ b/extra/grpc/huggingface/huggingface.yml
@ -0,0 +1,77 @@
+name: huggingface
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.11=h7f8727e_2
+  - pip=23.2.1=py311h06a4308_0
+  - python=3.11.5=h955ad1f_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.0.0=py311h06a4308_0
+  - sqlite=3.41.2=h5eee18b_0
+  - tk=8.6.12=h1ccaba5_0
+  - tzdata=2023c=h04d1e81_0
+  - wheel=0.41.2=py311h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - certifi==2023.7.22
+      - charset-normalizer==3.3.0
+      - click==8.1.7
+      - filelock==3.12.4
+      - fsspec==2023.9.2
+      - grpcio==1.59.0
+      - huggingface-hub==0.17.3
+      - idna==3.4
+      - install==1.3.5
+      - jinja2==3.1.2
+      - joblib==1.3.2
+      - markupsafe==2.1.3
+      - mpmath==1.3.0
+      - networkx==3.1
+      - nltk==3.8.1
+      - numpy==1.26.0
+      - nvidia-cublas-cu12==12.1.3.1
+      - nvidia-cuda-cupti-cu12==12.1.105
+      - nvidia-cuda-nvrtc-cu12==12.1.105
+      - nvidia-cuda-runtime-cu12==12.1.105
+      - nvidia-cudnn-cu12==8.9.2.26
+      - nvidia-cufft-cu12==11.0.2.54
+      - nvidia-curand-cu12==10.3.2.106
+      - nvidia-cusolver-cu12==11.4.5.107
+      - nvidia-cusparse-cu12==12.1.0.106
+      - nvidia-nccl-cu12==2.18.1
+      - nvidia-nvjitlink-cu12==12.2.140
+      - nvidia-nvtx-cu12==12.1.105
+      - packaging==23.2
+      - pillow==10.0.1
+      - protobuf==4.24.4
+      - pyyaml==6.0.1
+      - regex==2023.10.3
+      - requests==2.31.0
+      - safetensors==0.4.0
+      - scikit-learn==1.3.1
+      - scipy==1.11.3
+      - sentence-transformers==2.2.2
+      - sentencepiece==0.1.99
+      - sympy==1.12
+      - threadpoolctl==3.2.0
+      - tokenizers==0.14.1
+      - torch==2.1.0
+      - torchvision==0.16.0
+      - tqdm==4.66.1
+      - transformers==4.34.0
+      - triton==2.1.0
+      - typing-extensions==4.8.0
+      - urllib3==2.0.6
+prefix: /opt/conda/envs/huggingface
--- a/extra/grpc/huggingface/run.sh
+++ b/extra/grpc/huggingface/run.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+
+##
+## A bash script wrapper that runs the huggingface server with conda
+
+export PATH=$PATH:/opt/conda/bin
+
+# Activate conda environment
+source activate huggingface
+
+# get the directory where the bash script is located
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+python $DIR/huggingface.py $@
--- a/extra/grpc/huggingface/test.sh
+++ b/extra/grpc/huggingface/test.sh
@ -0,0 +1,11 @@
+#!/bin/bash
+##
+## A bash script wrapper that runs the huggingface server with conda
+
+# Activate conda environment
+source activate huggingface
+
+# get the directory where the bash script is located
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+python -m unittest $DIR/test_huggingface.py
--- a/extra/grpc/huggingface/test_huggingface.py
+++ b/extra/grpc/huggingface/test_huggingface.py
@ -0,0 +1,81 @@
+"""
+A test script to test the gRPC service
+"""
+import unittest
+import subprocess
+import time
+import backend_pb2
+import backend_pb2_grpc
+
+import grpc
+
+
+class TestBackendServicer(unittest.TestCase):
+    """
+    TestBackendServicer is the class that tests the gRPC service
+    """
+    def setUp(self):
+        """
+        This method sets up the gRPC service by starting the server
+        """
+        self.service = subprocess.Popen(["python3", "huggingface.py", "--addr", "localhost:50051"])
+
+    def tearDown(self) -> None:
+        """
+        This method tears down the gRPC service by terminating the server
+        """
+        self.service.terminate()
+        self.service.wait()
+
+    def test_server_startup(self):
+        """
+        This method tests if the server starts up successfully
+        """
+        time.sleep(2)
+        try:
+            self.setUp()
+            with grpc.insecure_channel("localhost:50051") as channel:
+                stub = backend_pb2_grpc.BackendStub(channel)
+                response = stub.Health(backend_pb2.HealthMessage())
+                self.assertEqual(response.message, b'OK')
+        except Exception as err:
+            print(err)
+            self.fail("Server failed to start")
+        finally:
+            self.tearDown()
+
+    def test_load_model(self):
+        """
+        This method tests if the model is loaded successfully
+        """
+        try:
+            self.setUp()
+            with grpc.insecure_channel("localhost:50051") as channel:
+                stub = backend_pb2_grpc.BackendStub(channel)
+                response = stub.LoadModel(backend_pb2.ModelOptions(Model="bert-base-nli-mean-tokens"))
+                self.assertTrue(response.success)
+                self.assertEqual(response.message, "Model loaded successfully")
+        except Exception as err:
+            print(err)
+            self.fail("LoadModel service failed")
+        finally:
+            self.tearDown()
+
+    def test_embedding(self):
+        """
+        This method tests if the embeddings are generated successfully
+        """
+        try:
+            self.setUp()
+            with grpc.insecure_channel("localhost:50051") as channel:
+                stub = backend_pb2_grpc.BackendStub(channel)
+                response = stub.LoadModel(backend_pb2.ModelOptions(Model="bert-base-nli-mean-tokens"))
+                self.assertTrue(response.success)
+                embedding_request = backend_pb2.PredictOptions(Embeddings="This is a test sentence.")
+                embedding_response = stub.Embedding(embedding_request)
+                self.assertIsNotNone(embedding_response.embeddings)
+        except Exception as err:
+            print(err)
+            self.fail("Embedding service failed")
+        finally:
+            self.tearDown()
--- a/extra/grpc/vall-e-x/Makefile
+++ b/extra/grpc/vall-e-x/Makefile
@ -0,0 +1,11 @@
+.PONY: ttsvalle
+ttsvalle:
+	@echo "Creating virtual environment..."
+	@conda env create --name ttsvalle --file ttsvalle.yml
+	@echo "Virtual environment created."
+
+.PONY: run
+run:
+	@echo "Running ttsvalle..."
+	bash run.sh
+	@echo "ttsvalle run."
--- a/extra/grpc/vall-e-x/README.md
+++ b/extra/grpc/vall-e-x/README.md
@ -0,0 +1,5 @@
+# Creating a separate environment for the ttsvalle project
+
+```
+make ttsvalle
+```
--- a/extra/grpc/vall-e-x/run.sh
+++ b/extra/grpc/vall-e-x/run.sh
@ -0,0 +1,13 @@
+#!/bin/bash
+
+##
+## A bash script wrapper that runs the ttsvalle server with conda
+export PATH=$PATH:/opt/conda/bin
+
+# Activate conda environment
+source activate ttsvalle
+
+# get the directory where the bash script is located
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+python $DIR/ttvalle.py $@
--- a/extra/grpc/vall-e-x/ttsvalle.py
+++ b/extra/grpc/vall-e-x/ttsvalle.py
@ -1,14 +1,15 @@
 #!/usr/bin/env python3
-import grpc
+
 from concurrent import futures
-import time
-import backend_pb2
-import backend_pb2_grpc
 import argparse
 import signal
 import sys
 import os
-from pathlib import Path
+import time
+import backend_pb2
+import backend_pb2_grpc
+
+import grpc

 from utils.generation import SAMPLE_RATE, generate_audio, preload_models
 from scipy.io.wavfile import write as write_wav
@ -21,9 +22,34 @@ MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))

 # Implement the BackendServicer class with the service methods
 class BackendServicer(backend_pb2_grpc.BackendServicer):
+    """
+    gRPC servicer for backend services.
+    """
    def Health(self, request, context):
+        """
+        Health check service.
+
+        Args:
+            request: A backend_pb2.HealthRequest instance.
+            context: A grpc.ServicerContext instance.
+
+        Returns:
+            A backend_pb2.Reply instance with message "OK".
+        """
        return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
+
    def LoadModel(self, request, context):
+        """
+        Load model service.
+
+        Args:
+            request: A backend_pb2.LoadModelRequest instance.
+            context: A grpc.ServicerContext instance.
+
+        Returns:
+            A backend_pb2.Result instance with message "Model loaded successfully" and success=True if successful.
+            A backend_pb2.Result instance with success=False and error message if unsuccessful.
+        """
        model_name = request.Model
        try:
            print("Preparing models, please wait", file=sys.stderr)
@ -49,6 +75,17 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
        return backend_pb2.Result(message="Model loaded successfully", success=True)

    def TTS(self, request, context):
+        """
+        Text-to-speech service.
+
+        Args:
+            request: A backend_pb2.TTSRequest instance.
+            context: A grpc.ServicerContext instance.
+
+        Returns:
+            A backend_pb2.Result instance with success=True if successful.
+            A backend_pb2.Result instance with success=False and error message if unsuccessful.
+        """
        model = request.model
        print(request, file=sys.stderr)
        try:
--- a/extra/grpc/vall-e-x/ttsvalle.yml
+++ b/extra/grpc/vall-e-x/ttsvalle.yml
@ -0,0 +1,101 @@
+name: ttsvalle
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.11=h7f8727e_2
+  - pip=23.2.1=py310h06a4308_0
+  - python=3.10.13=h955ad1f_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.0.0=py310h06a4308_0
+  - sqlite=3.41.2=h5eee18b_0
+  - tk=8.6.12=h1ccaba5_0
+  - tzdata=2023c=h04d1e81_0
+  - wheel=0.41.2=py310h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - aiofiles==23.2.1
+      - altair==5.1.2
+      - annotated-types==0.6.0
+      - anyio==3.7.1
+      - click==8.1.7
+      - cn2an==0.5.22
+      - cython==3.0.3
+      - einops==0.7.0
+      - encodec==0.1.1
+      - eng-to-ipa==0.0.2
+      - fastapi==0.103.2
+      - ffmpeg-python==0.2.0
+      - ffmpy==0.3.1
+      - fsspec==2023.9.2
+      - future==0.18.3
+      - gradio==3.47.1
+      - gradio-client==0.6.0
+      - grpcio==1.59.0
+      - h11==0.14.0
+      - httpcore==0.18.0
+      - httpx==0.25.0
+      - huggingface-hub==0.17.3
+      - importlib-resources==6.1.0
+      - inflect==7.0.0
+      - jieba==0.42.1
+      - langid==1.1.6
+      - llvmlite==0.41.0
+      - more-itertools==10.1.0
+      - nltk==3.8.1
+      - numba==0.58.0
+      - numpy==1.25.2
+      - nvidia-cublas-cu12==12.1.3.1
+      - nvidia-cuda-cupti-cu12==12.1.105
+      - nvidia-cuda-nvrtc-cu12==12.1.105
+      - nvidia-cuda-runtime-cu12==12.1.105
+      - nvidia-cudnn-cu12==8.9.2.26
+      - nvidia-cufft-cu12==11.0.2.54
+      - nvidia-curand-cu12==10.3.2.106
+      - nvidia-cusolver-cu12==11.4.5.107
+      - nvidia-cusparse-cu12==12.1.0.106
+      - nvidia-nccl-cu12==2.18.1
+      - nvidia-nvjitlink-cu12==12.2.140
+      - nvidia-nvtx-cu12==12.1.105
+      - openai-whisper==20230306
+      - orjson==3.9.7
+      - proces==0.1.7
+      - protobuf==4.24.4
+      - pydantic==2.4.2
+      - pydantic-core==2.10.1
+      - pydub==0.25.1
+      - pyopenjtalk-prebuilt==0.3.0
+      - pypinyin==0.49.0
+      - python-multipart==0.0.6
+      - regex==2023.10.3
+      - safetensors==0.4.0
+      - semantic-version==2.10.0
+      - soundfile==0.12.1
+      - starlette==0.27.0
+      - sudachidict-core==20230927
+      - sudachipy==0.6.7
+      - tokenizers==0.14.1
+      - toolz==0.12.0
+      - torch==2.1.0
+      - torchaudio==2.1.0
+      - torchvision==0.16.0
+      - tqdm==4.66.1
+      - transformers==4.34.0
+      - triton==2.1.0
+      - unidecode==1.3.7
+      - uvicorn==0.23.2
+      - vocos==0.0.3
+      - websockets==11.0.3
+      - wget==3.2
+prefix: /opt/conda/envs/ttsvalle
--- a/extra/grpc/vllm/Makefile
+++ b/extra/grpc/vllm/Makefile
@ -0,0 +1,11 @@
+.PONY: vllm
+vllm:
+	@echo "Creating virtual environment..."
+	@conda env create --name vllm --file vllm.yml
+	@echo "Virtual environment created."
+
+.PONY: run
+run:
+	@echo "Running vllm..."
+	bash run.sh
+	@echo "vllm run."
--- a/extra/grpc/vllm/README.md
+++ b/extra/grpc/vllm/README.md
@ -0,0 +1,5 @@
+# Creating a separate environment for the vllm project
+
+```
+make vllm
+```
--- a/extra/grpc/vllm/backend_vllm.py
+++ b/extra/grpc/vllm/backend_vllm.py
@ -1,15 +1,15 @@
 #!/usr/bin/env python3
-import grpc
 from concurrent import futures
 import time
-import backend_pb2
-import backend_pb2_grpc
 import argparse
 import signal
 import sys
-import os, glob
+import os

-from pathlib import Path
+import backend_pb2
+import backend_pb2_grpc
+
+import grpc
 from vllm import LLM, SamplingParams

 _ONE_DAY_IN_SECONDS = 60 * 60 * 24
@ -19,7 +19,20 @@ MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))

 # Implement the BackendServicer class with the service methods
 class BackendServicer(backend_pb2_grpc.BackendServicer):
+    """
+    A gRPC servicer that implements the Backend service defined in backend.proto.
+    """
    def generate(self,prompt, max_new_tokens):
+        """
+        Generates text based on the given prompt and maximum number of new tokens.
+
+        Args:
+            prompt (str): The prompt to generate text from.
+            max_new_tokens (int): The maximum number of new tokens to generate.
+
+        Returns:
+            str: The generated text.
+        """
        self.generator.end_beam_search()

        # Tokenizing the input
@ -41,9 +54,31 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
            if token.item() == self.generator.tokenizer.eos_token_id:
                break
        return decoded_text
+
    def Health(self, request, context):
+        """
+        Returns a health check message.
+
+        Args:
+            request: The health check request.
+            context: The gRPC context.
+
+        Returns:
+            backend_pb2.Reply: The health check reply.
+        """
        return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
+
    def LoadModel(self, request, context):
+        """
+        Loads a language model.
+
+        Args:
+            request: The load model request.
+            context: The gRPC context.
+
+        Returns:
+            backend_pb2.Result: The load model result.
+        """
        try:
            if request.Quantization != "":
                self.llm = LLM(model=request.Model, quantization=request.Quantization)
@ -54,6 +89,16 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
        return backend_pb2.Result(message="Model loaded successfully", success=True)

    def Predict(self, request, context):
+        """
+        Generates text based on the given prompt and sampling parameters.
+
+        Args:
+            request: The predict request.
+            context: The gRPC context.
+
+        Returns:
+            backend_pb2.Result: The predict result.
+        """
        if request.TopP == 0:
            request.TopP = 0.9

@ -68,6 +113,16 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
        return backend_pb2.Result(message=bytes(generated_text, encoding='utf-8'))

    def PredictStream(self, request, context):
+        """
+        Generates text based on the given prompt and sampling parameters, and streams the results.
+
+        Args:
+            request: The predict stream request.
+            context: The gRPC context.
+
+        Returns:
+            backend_pb2.Result: The predict stream result.
+        """
        # Implement PredictStream RPC
        #for reply in some_data_generator():
        #    yield reply
--- a/extra/grpc/vllm/run.sh
+++ b/extra/grpc/vllm/run.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+
+##
+## A bash script wrapper that runs the diffusers server with conda
+
+export PATH=$PATH:/opt/conda/bin
+
+# Activate conda environment
+source activate vllm
+
+# get the directory where the bash script is located
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+python $DIR/backend_vllm.py $@
--- a/extra/grpc/vllm/test_backend_vllm.py
+++ b/extra/grpc/vllm/test_backend_vllm.py
@ -0,0 +1,41 @@
+import unittest
+import subprocess
+import time
+import backend_pb2
+import backend_pb2_grpc
+
+import grpc
+
+import unittest
+import subprocess
+import time
+import grpc
+import backend_pb2_grpc
+import backend_pb2
+
+class TestBackendServicer(unittest.TestCase):
+    """
+    TestBackendServicer is the class that tests the gRPC service.
+
+    This class contains methods to test the startup and shutdown of the gRPC service.
+    """
+    def setUp(self):
+        self.service = subprocess.Popen(["python", "backend_vllm.py", "--addr", "localhost:50051"])
+
+    def tearDown(self) -> None:
+        self.service.terminate()
+        self.service.wait()
+
+    def test_server_startup(self):
+        time.sleep(2)
+        try:
+            self.setUp()
+            with grpc.insecure_channel("localhost:50051") as channel:
+                stub = backend_pb2_grpc.BackendStub(channel)
+                response = stub.Health(backend_pb2.HealthMessage())
+                self.assertEqual(response.message, b'OK')
+        except Exception as err:
+            print(err)
+            self.fail("Server failed to start")
+        finally:
+            self.tearDown()
--- a/extra/grpc/vllm/vllm.yml
+++ b/extra/grpc/vllm/vllm.yml
@ -0,0 +1,99 @@
+name: vllm
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.11=h7f8727e_2
+  - pip=23.2.1=py311h06a4308_0
+  - python=3.11.5=h955ad1f_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.0.0=py311h06a4308_0
+  - sqlite=3.41.2=h5eee18b_0
+  - tk=8.6.12=h1ccaba5_0
+  - wheel=0.41.2=py311h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - aiosignal==1.3.1
+      - anyio==3.7.1
+      - attrs==23.1.0
+      - certifi==2023.7.22
+      - charset-normalizer==3.3.0
+      - click==8.1.7
+      - cmake==3.27.6
+      - fastapi==0.103.2
+      - filelock==3.12.4
+      - frozenlist==1.4.0
+      - fsspec==2023.9.2
+      - grpcio==1.59.0
+      - h11==0.14.0
+      - httptools==0.6.0
+      - huggingface-hub==0.17.3
+      - idna==3.4
+      - jinja2==3.1.2
+      - jsonschema==4.19.1
+      - jsonschema-specifications==2023.7.1
+      - lit==17.0.2
+      - markupsafe==2.1.3
+      - mpmath==1.3.0
+      - msgpack==1.0.7
+      - networkx==3.1
+      - ninja==1.11.1
+      - numpy==1.26.0
+      - nvidia-cublas-cu11==11.10.3.66
+      - nvidia-cuda-cupti-cu11==11.7.101
+      - nvidia-cuda-nvrtc-cu11==11.7.99
+      - nvidia-cuda-runtime-cu11==11.7.99
+      - nvidia-cudnn-cu11==8.5.0.96
+      - nvidia-cufft-cu11==10.9.0.58
+      - nvidia-curand-cu11==10.2.10.91
+      - nvidia-cusolver-cu11==11.4.0.1
+      - nvidia-cusparse-cu11==11.7.4.91
+      - nvidia-nccl-cu11==2.14.3
+      - nvidia-nvtx-cu11==11.7.91
+      - packaging==23.2
+      - pandas==2.1.1
+      - protobuf==4.24.4
+      - psutil==5.9.5
+      - pyarrow==13.0.0
+      - pydantic==1.10.13
+      - python-dateutil==2.8.2
+      - python-dotenv==1.0.0
+      - pytz==2023.3.post1
+      - pyyaml==6.0.1
+      - ray==2.7.0
+      - referencing==0.30.2
+      - regex==2023.10.3
+      - requests==2.31.0
+      - rpds-py==0.10.4
+      - safetensors==0.4.0
+      - sentencepiece==0.1.99
+      - six==1.16.0
+      - sniffio==1.3.0
+      - starlette==0.27.0
+      - sympy==1.12
+      - tokenizers==0.14.1
+      - torch==2.0.1
+      - tqdm==4.66.1
+      - transformers==4.34.0
+      - triton==2.0.0
+      - typing-extensions==4.8.0
+      - tzdata==2023.3
+      - urllib3==2.0.6
+      - uvicorn==0.23.2
+      - uvloop==0.17.0
+      - vllm==0.2.0
+      - watchfiles==0.20.0
+      - websockets==11.0.3
+      - xformers==0.0.22
+prefix: /opt/conda/envs/vllm