Production-ready observability and monitoring platform for AI agents. Features real-time metrics tracking, cost monitoring, automatic alerting, LangSmith integration, and comprehensive dashboards. Built with FastAPI, LangChain, and modern observability best practices.
This platform provides comprehensive observability for AI agent systems:
- Real-Time Metrics: Track request rates, latency, success rates, and error rates
- Cost Tracking: Monitor API costs with daily breakdowns and optimization suggestions
- Alerting System: Automatic alerts for cost thresholds, error rates, and latency issues
- LangSmith Integration: Seamless integration with LangSmith for detailed tracing
- Performance Analytics: Percentile-based latency analysis (p50, p95, p99)
- Rate Limiting: Built-in rate limiting to protect your API usage
- Health Monitoring: System health checks and status endpoints
- Request tracing with LangSmith
- Real-time metrics collection
- Historical data analysis
- Performance percentile tracking
- Error tracking and analysis
- Real-time cost tracking per request
- Daily and monthly cost projections
- Cost optimization suggestions
- Token usage monitoring
- Model-specific cost calculations
- Configurable alert thresholds
- Multiple alert types (cost, error rate, latency)
- Alert severity levels
- Alert resolution tracking
- Real-time alert notifications
- Rate limiting per agent
- Request queuing
- Health check endpoints
- Error handling and recovery
- Scalable architecture
- Python 3.8 or higher
- Gemini API key
- (Optional) LangSmith API key for enhanced tracing
- Navigate to the project directory:
cd agent_observability_platform- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .envEdit .env and add your API keys:
GEMINI_API_KEY=your_GEMINI_API_KEY_here
LANGSMITH_API_KEY=your_langsmith_api_key_here # Optional
LANGSMITH_PROJECT=agent-observability-platform
LANGSMITH_TRACING=true
- Create necessary directories:
mkdir -p data/metricsRun the application:
python main.pyThe server will start on http://localhost:8000 by default.
- Open your browser and navigate to
http://localhost:8000 - View real-time metrics and statistics
- Monitor active alerts
- Review recent request metrics
- Track costs and performance
POST /api/agents/execute
Content-Type: application/json
{
"agent_name": "my-agent",
"prompt": "What is the capital of France?",
"model": "gpt-3.5-turbo",
"temperature": 0.7
}GET /api/metrics/summary?agent_name=my-agent&hours=24GET /api/metrics/recent?agent_name=my-agent&limit=100GET /api/cost/daily?agent_name=my-agentGET /api/alertsPOST /api/alerts/{alert_id}/resolveGET /api/healthagent_observability_platform/
βββ backend/
β βββ api/
β β βββ main.py # FastAPI application
β βββ core/
β β βββ database.py # Database management
β β βββ langsmith_client.py # LangSmith integration
β β βββ rate_limiter.py # Rate limiting
β βββ monitoring/
β β βββ metrics_collector.py # Metrics collection
β β βββ alert_manager.py # Alert management
β β βββ cost_tracker.py # Cost tracking
β βββ models/
β βββ metrics.py # Pydantic models
β βββ database.py # SQLAlchemy models
βββ frontend/
β βββ templates/
β βββ index.html # Monitoring dashboard
βββ data/
β βββ metrics/ # Metrics storage
βββ tests/
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
βββ main.py # Application entry point
βββ requirements.txt # Python dependencies
βββ README.md # This file
Environment variables in .env:
GEMINI_API_KEY: Required - Your Gemini API keyLANGSMITH_API_KEY: Optional - LangSmith API key for tracingLANGSMITH_PROJECT: LangSmith project nameLANGSMITH_TRACING: Enable/disable LangSmith tracing (true/false)DATABASE_URL: Database connection stringRATE_LIMIT_ENABLED: Enable rate limiting (true/false)RATE_LIMIT_PER_MINUTE: Default rate limit per minuteALERT_COST_THRESHOLD: Daily cost threshold for alerts (USD)ALERT_ERROR_RATE_THRESHOLD: Error rate threshold (0.0-1.0)ALERT_LATENCY_THRESHOLD_MS: Latency threshold in milliseconds
The platform integrates with LangSmith for enhanced observability:
- Set
LANGSMITH_API_KEYin your.envfile - Set
LANGSMITH_TRACING=true - All agent requests will be automatically traced
- View detailed traces in the LangSmith dashboard
- Trace IDs are stored with metrics for correlation
The platform automatically monitors:
- Cost Alerts: Triggered when daily cost exceeds threshold
- Error Rate Alerts: Triggered when error rate exceeds threshold
- Latency Alerts: Triggered when average latency exceeds threshold
Alerts can be resolved through the API or dashboard.
The platform tracks costs based on:
- Model pricing (GPT-4, GPT-3.5-turbo, etc.)
- Input and output token usage
- Per-request cost calculation
- Daily and monthly projections
Cost optimization suggestions are provided based on usage patterns.
For production deployment:
- Use PostgreSQL instead of SQLite
- Set up Redis for distributed rate limiting
- Configure proper CORS origins
- Set up monitoring and logging
- Use environment-specific configurations
- Enable HTTPS
- Set up backup strategies for metrics data
- Rate limiting is in-memory (use Redis for distributed systems)
- SQLite database (upgrade to PostgreSQL for production)
- Basic alerting (extend for email/Slack notifications)
- Email/Slack alert notifications
- Advanced analytics and reporting
- Multi-tenant support
- Custom dashboards
- Export metrics to external systems
- Machine learning-based anomaly detection
- A/B testing framework
- Performance benchmarking
- FastAPI: Modern async web framework
- LangChain: Agent framework integration
- LangSmith: Tracing and observability
- SQLite/PostgreSQL: Metrics storage
- Redis: Rate limiting and caching
- Python 3.8+: Core language
- Production Monitoring: Real-time monitoring of agent systems
- Cost Management: Track and optimize API costs
- Performance Analytics: Identify bottlenecks and optimize
- Alert Management: Proactive issue detection
- Observability: Comprehensive system visibility
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.