A AI-powered database querying system that transforms natural language questions into SQL queries and provides insights with visualizations.
- Python 3.12+
- Ollama with Code Llama 7B model
-
Start Ollama and pull the model:
ollama pull codellama:7b-code
-
Launch the Streamlit app:
streamlit run app.py
-
Open your browser to
http://localhost:8501
LLM agent/
├── agent_main.py # Main LLM agent
├── test_correction_direct.py # Self-correction testing script
├── test_self_correction.py # Additional test suite
├── sales_data.db # SQLite database
└── README.md # This documentation
- Agent Control Panel: Interactive sidebar with database selection and agent management
- Database Selection: Switch between data sources with automatic schema reloading
- Schema Discovery: Automatically reads and understands database structure
- Natural Language Queries: Uses Code Llama to convert questions to SQL
- Self-Correction Loop: Automatically retries and learns from SQL errors
- Conversational Memory: Remembers previous questions for context-aware follow-ups
- Disambiguation Routing: Identifies ambiguous questions and asks for clarification
- Secure Execution: Validates and safely executes SQL queries
- Intelligent Responses: Provides natural language answers
- Visualization: Generates matplotlib code for data visualization
- Interactive Web Interface: Professional Streamlit application with real-time updates
- Error Recovery: Handles LLM hallucinations and schema misunderstandings
pip install requests pandas matplotlib# Install Ollama (if not already installed)
# Download from: https://ollama.ai/
# Pull Code Llama model
ollama pull codellama:7b-code
# Start Ollama server (keep running in background)
ollama servepython sales_database.pypython agent_main.pyStarting LLM Database Agent
==================================================
Agent initialized with database: sales_data.db
Database schema loaded: ['Users', 'Products', 'Orders'] tables
Available Tables:
• Users: user_id, username, join_date
• Products: product_id, product_name, category, price
• Orders: order_id, user_id, product_id, quantity, order_date
Example Question: How many orders does user with ID 1 have?
Calling Code Llama to generate SQL...
Generated SQL: SELECT COUNT(*) FROM Orders WHERE user_id = 1;
Executing SQL: SELECT COUNT(*) FROM Orders WHERE user_id = 1;
Execution successful!
[AGENT NATURAL LANGUAGE RESPONSE]
Based on your question 'How many orders does user with ID 1 have?', I found 1 result(s).
The answer is: 2
[AGENT VISUALIZATION CODE]
import matplotlib.pyplot as plt
import pandas as pd
# ... (visualization code generated)
user_id(INTEGER PRIMARY KEY)username(TEXT)join_date(TEXT)
product_id(INTEGER PRIMARY KEY)product_name(TEXT)category(TEXT)price(REAL)
order_id(INTEGER PRIMARY KEY)user_id(INTEGER, FK → Users.user_id)product_id(INTEGER, FK → Products.product_id)quantity(INTEGER)order_date(TEXT)
- SQL Injection Protection: Validates queries for dangerous keywords
- Read-Only Operations: Blocks INSERT, UPDATE, DELETE, DROP operations
- Safe Execution: Uses pandas for secure query execution
- Error Handling: Comprehensive exception handling
- Model:
codellama:7b-code - Endpoint:
http://localhost:11434/api/generate - Temperature: 0.1 (for consistent SQL generation)
- File:
sales_data.db - Type: SQLite 3
- Connection: Local file-based
-
"LLM connection failed"
- Ensure Ollama is running:
ollama serve - Check if Code Llama is installed:
ollama list
- Ensure Ollama is running:
-
"Database not found"
- Run the database setup:
python sales_database.py
- Run the database setup:
-
"Package import errors"
- Install dependencies:
pip install requests pandas matplotlib
- Install dependencies:
- Query Response Time: ~2-5 seconds (depending on LLM)
- Database Size: ~50KB (sample data)
- Memory Usage: ~100MB (including LLM client)
- Schema Discovery: Read database structure and relationships
- Query Processing: Convert natural language to SQL using LLM with chat history context
- Self-Correction Loop:
- Try to execute SQL
- If error occurs, pass error context to LLM for correction
- Retry up to 3 times with improved prompts
- Learn from previous failures
- Security Validation: Check for dangerous SQL operations
- Execution: Run query safely and capture results
- Natural Language Synthesis: Use LLM to generate conversational response
- Intelligent Visualization: Use LLM to create appropriate chart code
- Final Output: Provide complete answer with visualization
- Chat History Management: Store successful interactions for context
- Context-Aware Processing: Use conversation history in future queries
- Follow-up Understanding: Handle implicit references to previous topics
User: "What are the total orders for user 1?"
→ SQL: SELECT COUNT(*) FROM Orders WHERE user_id = 1;
→ Result: 2 orders
→ Saved to chat history
User: "What about user 2?" (Follow-up question)
→ Context: Previous question was about total orders for users
→ SQL: SELECT COUNT(*) FROM Orders WHERE user_id = 2;
→ Result: 1 order
→ Agent understands "user 2" refers to same query type
- DatabaseAgent: Main agent class handling all operations
- Schema Reader: Extracts table structures and relationships
- LLM Interface: Communicates with Code Llama via REST API
- Security Validator: Prevents harmful SQL execution
- Response Synthesizer: Converts results to natural language
- Visualization Generator: Creates plot code based on data types
- Connection timeouts and retries
- SQL syntax error recovery
- Graceful LLM failure handling
- Database lock prevention