This repository contains a Streamlit-based Document Chat Assistant, enabling users to interact with and query content from various uploaded documents. Users can upload multiple text formats (PDF, TXT, CSV, Excel) and even a ZIP archive containing multiple documents. The application leverages LangChain, langchain-community extensions, langchain-openai, and OpenAI embeddings and chat models to provide semantic search and Q&A functionalities over the uploaded documents.
Key Features:
- Document Upload: Upload multiple documents or a ZIP file containing many files.
- Semantic Search & Q&A: Ask questions in natural language; get contextually relevant answers sourced from your documents.
- Summarization: Generate a global summary of all uploaded documents.
- Metadata Awareness: Display metadata (e.g., PDF page numbers, CSV row counts) for each source reference.
- Model Selection: Switch between OpenAI models (e.g.,
gpt-4,gpt-3.5-turbo). - History Management: Clear conversation history, clear all documents, and download your chat history.
Language Interface:
The UI and messages are presented in French.

Clone the repository and navigate to the project directory:
git clone https://github.com/sanlamamba/Document-Talk.git
cd Document-TalkMake sure you have Python 3.11+ installed. We recommend using a virtual environment:
python3.11 -m rag_talk rag_talk
source rag_talk/bin/activateif you are using conda you can create a new environment using the following command
conda create -n rag_talk python=3.11
conda activate rag_talkInstall the required dependencies from requirements.txt:
pip install --upgrade pip
pip install -r requirements.txtRequired Packages:
streamlit>=1.14.0langchain>=0.0.9langchain-community==0.3.18langchain-openai==0.3.6openai==1.63.2pandas==2.2.3PyPDF2==3.0.1
Get your OpenAI API Key, You can obtain an openAI key : -> (Open AI Api Keys)[https://platform.openai.com/api-keys]
If you wish to learn more about Open AI API keys you can read more about it here : (Open AI Quickstart)[https://platform.openai.com/docs/quickstart]
Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY="sk-..."Once all dependencies are installed and OPENAI_API_KEY is set, run:
streamlit run app.pyYou should see output similar to:
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Network URL: http://<your_machine_ip>:8501
Open the URL in your browser to access the Document Chat Assistant interface.
-
Upload Documents:
In the sidebar, upload one or multiple files (PDF, TXT, CSV, Excel) or a ZIP file containing multiple documents. -
Ask Questions:
Type your questions in the main interface. The assistant will respond with an answer and provide a list of source documents (with metadata and excerpts) used to generate the response. -
Summarize Documents:
Use the "Résumer tous les documents" button in the sidebar to get a global summary. -
Manage Documents & History:
- Clear all uploaded documents using the "Effacer tous les documents" button in the sidebar (you will need to refresh the page afterward).
- Clear chat history using the "Effacer l'historique de conversation" button in the sidebar.
- Download your chat history for record-keeping.
Use the dropdown in the sidebar to choose between gpt-4 and gpt-3.5-turbo. This selection affects the LLM used for Q&A and summarization.
- The application is in French, but can handle documents in various languages.
- Ensure your OpenAI API key has appropriate billing and model access.
- If you encounter version warnings or deprecation warnings, consider upgrading the respective libraries.
This project is provided under an MIT License. See LICENSE for details.