DocuQuery is a Streamlit-based document assistant that allows users to upload PDF documents, receive concise summaries, and interactively query the content using a Retrieval-Augmented Generation (RAG) pipeline powered by LangChain and Groq's LLMs.
- π€ Upload one or more PDF files
- π§ Summarize content using LLM (LLaMA3 via Groq)
- π¬ Ask natural language questions based on the uploaded documents
- π View source document excerpts for transparency
- Streamlit
- LangChain
- Groq API
- HuggingFace Embeddings
- FAISS for vector storage
- Clone the repository
git clone https://github.com/yourusername/docuquery.git cd docuquery - Set up a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Create a .env file
GROQ_API_KEY=your_groq_api_key_here
- Run the app
streamlit run main.py
- ** Project Structure**
βββ main.py # Streamlit app βββ rag_pipeline.py # RAG pipeline logic (load, embed, retrieve, summarize) βββ .env # Environment variable for API key (DO NOT SHARE) βββ requirements.txt # Python dependencies βββ README.md # Project documentation