This project combines a Retrieval-Augmented Generation (RAG) Conversational AI system with a robust Evaluation Framework. The RAG system integrates Cohere's LLM, MongoDB for storage, and LangChain for intelligent query processing, while the Evaluation Framework enables benchmarking with synthetic test sets, performance metrics, and detailed visualizations.
- Document Ingestion: Processes JSON files, generates embeddings using Cohere, and stores them in MongoDB.
- Dynamic Query Routing: Automatically decides whether to use RAG or a pure chat model based on query relevance.
- Context Management: Maintains conversation history and handles token-aware context truncation.
- Error Handling: Implements retry mechanisms for MongoDB connections and logs errors comprehensively.
- Synthetic Testset Generation: Creates test queries with ground truths using GPT-based models.
- Performance Metrics: Evaluates precision, recall, faithfulness, relevancy, and response times using RAGAS.
- Visualization: Generates radar charts, histograms, confusion matrices, and execution time distributions.
- PDF Reporting: Produces detailed performance reports with improvement suggestions.
- Python 3.8 or higher
- MongoDB Atlas account
- Cohere API key
git clone <your_github_repo_url>
cd RAG-Conversational-AI
pip install -r requirements.txt
Create a .env file in the root directory of your project with following the setup in .env.example
python ingest_docs.py
python main.py
python synthetic_testset.py
Uses the synthetic_testset.py script to create test queries with ground truths
python synthetic_eval_script.py
Evaluates the system's performance using by using metrics such as such as precision, recall, faithfulness, relevancy, cost, and response time.
python evaluation_report.py
Creates a detailed performance report with visualizations including radar charts, histograms, confusion matrices, and improvement suggestions.
-
Ask Questions: Enter your query when prompted. The system will:
- Reshape the question if needed.
- Decide whether to use RAG (retrieval-based) or Chat (LLM-only) mode.
- Generate a response based on the selected route.
-
Run Evaluations Select the evaluation option from the menu to test performance metrics.
-
Exit: Choose the exit option when you're done.
- Reads JSONL documents.
- Generates embeddings using Cohere's API.
- Stores content and embeddings in MongoDB.
- Question Reshaping Decision: Determines if the query needs additional context from conversation history.
- Standalone Question Generation: Reformulates queries into standalone questions if necessary.
- Inner Router Decision: Uses cosine similarity to decide between RAG or Chat mode.
- Response Generation:
- RAG Mode: Retrieves relevant documents and generates responses based on them.
- Chat Mode: Generates responses directly using Cohere's LLM.
The evaluation framework uses the following metrics:
- Context Precision: Measures how much of the retrieved context is relevant.
- Context Recall: Measures how much relevant information is retrieved.
- Faithfulness: Measures how well the answer aligns with the provided context.
- Context Relevancy: Measures how relevant the retrieved context is to the question.
- Answer Relevancy: Measures how relevant the generated answer is to the question.
The evaluation framework provides:
- Radar charts for median scores across key metrics.
- Histograms for metric distributions (e.g., precision, recall).
- Confusion matrices showing retrieval and answer accuracy rates.
- Execution time distribution histograms for performance analysis.
- PDF reports summarizing all results with actionable insights.
- Stores both documents and chat logs in separate collections.
- Implements retry mechanisms with exponential backoff for connection stability.
- Token-aware context truncation ensures queries stay within token limits.
- Conversation history is maintained for up to five previous interactions.
- Comprehensive logging system tracks all operations.
- Graceful degradation ensures smooth operation even during failures.
The following Python libraries are required:
cohere: For embeddings and language model interaction.pymongo: For MongoDB integration.transformers: For tokenization and context management.ragas: For evaluation metrics and testset generation.reportlab: For generating PDF reports.matplotlib&pandas: For data visualization and analysis.
Install them via pip install -r requirements.txt.
- User enters a query: "What is the capital of France?"
- The system checks if reshaping is needed (e.g., based on prior context).
- If relevant documents are found in MongoDB, it uses RAG mode to generate an answer like "The capital of France is Paris."
- If no relevant documents are found, it switches to Chat mode and generates an answer using Cohere's LLM.
This project is licensed under the MIT License - see the LICENSE file for details.