RAG Assited Abstract Question Answering

This repository contains a program built with Pinecone vector storage, Groq, and Streamlit. The program is designed to parse documents, split them into chunks, convert these chunks into vectors, store them in a Pinecone vector database, and allow for efficient querying and retrieval of similar matches and their corresponding metadata. Additionally, a Streamlit frontend has been created to facilitate user interaction with the database.

Showcase

Features

Document Parsing and Chunking: A Jupyter Notebook is used to parse documents and split them into manageable chunks.
Vector Conversion and Storage: These chunks are then converted into vectors using the model all-mpnet-base-v2 and stored in a Pinecone vector database.
Efficient Querying: The database can be queried using a search query to retrieve similar matches along with their corresponding metadata.
Streamlit Frontend: A user-friendly frontend built with Streamlit allows users to query the database and view the sources of the retrieved information.
LLM Integration: The context from the retrieved data can be passed to an LLM (LLaMA 3-8b-8192) to provide an answer to the user's question using the available context from the vector storage.
Source Filtering: Optionally, the information to be retrieved can be filtered by its source (e.g., book or article title).

How It Works

Document Parsing:
- Documents are parsed and split into smaller chunks using a Jupyter Notebook.
Vector Conversion:
- The chunks are converted into vectors using the all-mpnet-base-v2 model.
Vector Storage:
- The vectors are stored in a Pinecone vector database.
Querying the Database:
- Users can query the database to retrieve similar matches and their metadata.
Streamlit Frontend:
- The Streamlit frontend allows users to interact with the database and view the results.
LLM Integration:
- The context from the retrieved data is passed to an LLM (LLaMA 3-8b-8192) to provide answers based on the available context.
Source Filtering:
- Users can filter the information to be retrieved by its source (e.g., book or article title).

Installation

To get started, clone this repository and install the necessary dependencies:

git clone https://github.com/yourusername/document-vector-search.git
cd document-vector-search
pip install -r requirements.txt

Usage

Update API Keys:
- Copy .env.example to .env and update your API keys
Run the Jupyter Notebook:
- Place your documents in the articles folder
- Parse and split your documents into chunks.
- Convert the chunks into vectors and store them in the Pinecone database.
Run the Streamlit Frontend:
- Start the Streamlit application to query the database and view the results.

streamlit run app.py

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Pinecone for the vector database.
Hugging Face for the all-mpnet-base-v2 model.
Streamlit for the frontend framework.
Groq for the computation acceleration.

Happy querying!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
web_app		web_app
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
vector_db_from_files.ipynb		vector_db_from_files.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Assited Abstract Question Answering

Showcase

Features

How It Works

Installation

Usage

Contributing

License

Acknowledgements

About

Uh oh!

Uh oh!

Languages

License

louisprp/rag-vector-db

Folders and files

Latest commit

History

Repository files navigation

RAG Assited Abstract Question Answering

Showcase

Features

How It Works

Installation

Usage

Contributing

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages