LocalAI/examples/query_data
quoing e7981152b2
[query_data example] max_chunk_overlap in PromptHelper must be in 0..1 range (#1000)
**Description**

Simple fix, percentage value is expected to be float in range 0..1

**Notes for Reviewers**


**[Signed
commits](../CONTRIBUTING.md#signing-off-on-commits-developer-certificate-of-origin)**
- [x] Yes, I signed my commits.
 

<!--
Thank you for contributing to LocalAI! 

Contributing Conventions:

1. Include descriptive PR titles with [<component-name>] prepended.
2. Build and test your changes before submitting a PR. 
3. Sign your commits

By following the community's contribution conventions upfront, the
review process will
be accelerated and your PR merged more quickly.
-->
2023-09-04 19:12:53 +02:00
..
data example(add): document query example 2023-05-05 21:56:31 +02:00
models examples: remove threads from example models (#337) 2023-05-21 12:25:24 +02:00
.gitignore example(add): document query example 2023-05-05 21:56:31 +02:00
docker-compose.yml docs: fix langchain-chroma example (#298) 2023-05-18 22:50:21 +02:00
query.py [query_data example] max_chunk_overlap in PromptHelper must be in 0..1 range (#1000) 2023-09-04 19:12:53 +02:00
README.md feat: allow to override model config (#323) 2023-05-20 17:03:53 +02:00
store.py [query_data example] max_chunk_overlap in PromptHelper must be in 0..1 range (#1000) 2023-09-04 19:12:53 +02:00
update.py examples: fix default parameter 2023-05-07 10:13:57 +02:00

Data query example

This example makes use of Llama-Index to enable question answering on a set of documents.

It loosely follows the quickstart.

Summary of the steps:

  • prepare the dataset (and store it into data)
  • prepare a vector index database to run queries on
  • run queries

Requirements

You will need a training data set. Copy that over data.

Setup

Start the API:

# Clone LocalAI
git clone https://github.com/go-skynet/LocalAI

cd LocalAI/examples/query_data

wget https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin -O models/bert
wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j

# start with docker-compose
docker-compose up -d --build

Create a storage

In this step we will create a local vector database from our document set, so later we can ask questions on it with the LLM.

Note: OPENAI_API_KEY is not required. However the library might fail if no API_KEY is passed by, so an arbitrary string can be used.

export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=sk-

python store.py

After it finishes, a directory "storage" will be created with the vector index database.

Query

We can now query the dataset.

export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=sk-

python query.py

Update

To update our vector database, run update.py

export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=sk-

python update.py