Agent Observability Platform

Production-ready observability and monitoring platform for AI agents. Features real-time metrics tracking, cost monitoring, automatic alerting, LangSmith integration, and comprehensive dashboards. Built with FastAPI, LangChain, and modern observability best practices.

What This Platform Does

This platform provides comprehensive observability for AI agent systems:

Real-Time Metrics: Track request rates, latency, success rates, and error rates
Cost Tracking: Monitor API costs with daily breakdowns and optimization suggestions
Alerting System: Automatic alerts for cost thresholds, error rates, and latency issues
LangSmith Integration: Seamless integration with LangSmith for detailed tracing
Performance Analytics: Percentile-based latency analysis (p50, p95, p99)
Rate Limiting: Built-in rate limiting to protect your API usage
Health Monitoring: System health checks and status endpoints

Key Features

Observability

Request tracing with LangSmith
Real-time metrics collection
Historical data analysis
Performance percentile tracking
Error tracking and analysis

Cost Management

Real-time cost tracking per request
Daily and monthly cost projections
Cost optimization suggestions
Token usage monitoring
Model-specific cost calculations

Alerting

Configurable alert thresholds
Multiple alert types (cost, error rate, latency)
Alert severity levels
Alert resolution tracking
Real-time alert notifications

Production Features

Rate limiting per agent
Request queuing
Health check endpoints
Error handling and recovery
Scalable architecture

Installation

Prerequisites

Python 3.8 or higher
Gemini API key
(Optional) LangSmith API key for enhanced tracing

Setup

Navigate to the project directory:

cd agent_observability_platform

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

cp .env.example .env

Edit .env and add your API keys:

GEMINI_API_KEY=your_GEMINI_API_KEY_here
LANGSMITH_API_KEY=your_langsmith_api_key_here  # Optional
LANGSMITH_PROJECT=agent-observability-platform
LANGSMITH_TRACING=true

Create necessary directories:

mkdir -p data/metrics

Usage

Starting the Server

Run the application:

python main.py

The server will start on http://localhost:8000 by default.

Using the Web Dashboard

Open your browser and navigate to http://localhost:8000
View real-time metrics and statistics
Monitor active alerts
Review recent request metrics
Track costs and performance

API Endpoints

Execute Agent Request

POST /api/agents/execute
Content-Type: application/json

{
  "agent_name": "my-agent",
  "prompt": "What is the capital of France?",
  "model": "gpt-3.5-turbo",
  "temperature": 0.7
}

Get Metrics Summary

GET /api/metrics/summary?agent_name=my-agent&hours=24

Get Recent Metrics

GET /api/metrics/recent?agent_name=my-agent&limit=100

Get Daily Cost

GET /api/cost/daily?agent_name=my-agent

Get Active Alerts

GET /api/alerts

Resolve Alert

POST /api/alerts/{alert_id}/resolve

Health Check

GET /api/health

Project Structure

agent_observability_platform/
├── backend/
│   ├── api/
│   │   └── main.py              # FastAPI application
│   ├── core/
│   │   ├── database.py          # Database management
│   │   ├── langsmith_client.py  # LangSmith integration
│   │   └── rate_limiter.py      # Rate limiting
│   ├── monitoring/
│   │   ├── metrics_collector.py # Metrics collection
│   │   ├── alert_manager.py     # Alert management
│   │   └── cost_tracker.py      # Cost tracking
│   └── models/
│       ├── metrics.py           # Pydantic models
│       └── database.py          # SQLAlchemy models
├── frontend/
│   └── templates/
│       └── index.html           # Monitoring dashboard
├── data/
│   └── metrics/                 # Metrics storage
├── tests/
│   ├── unit/                    # Unit tests
│   └── integration/             # Integration tests
├── main.py                      # Application entry point
├── requirements.txt             # Python dependencies
└── README.md                    # This file

Configuration

Environment variables in .env:

GEMINI_API_KEY: Required - Your Gemini API key
LANGSMITH_API_KEY: Optional - LangSmith API key for tracing
LANGSMITH_PROJECT: LangSmith project name
LANGSMITH_TRACING: Enable/disable LangSmith tracing (true/false)
DATABASE_URL: Database connection string
RATE_LIMIT_ENABLED: Enable rate limiting (true/false)
RATE_LIMIT_PER_MINUTE: Default rate limit per minute
ALERT_COST_THRESHOLD: Daily cost threshold for alerts (USD)
ALERT_ERROR_RATE_THRESHOLD: Error rate threshold (0.0-1.0)
ALERT_LATENCY_THRESHOLD_MS: Latency threshold in milliseconds

LangSmith Integration

The platform integrates with LangSmith for enhanced observability:

Set LANGSMITH_API_KEY in your .env file
Set LANGSMITH_TRACING=true
All agent requests will be automatically traced
View detailed traces in the LangSmith dashboard
Trace IDs are stored with metrics for correlation

Alerting

The platform automatically monitors:

Cost Alerts: Triggered when daily cost exceeds threshold
Error Rate Alerts: Triggered when error rate exceeds threshold
Latency Alerts: Triggered when average latency exceeds threshold

Alerts can be resolved through the API or dashboard.

Cost Tracking

The platform tracks costs based on:

Model pricing (GPT-4, GPT-3.5-turbo, etc.)
Input and output token usage
Per-request cost calculation
Daily and monthly projections

Cost optimization suggestions are provided based on usage patterns.

Production Deployment

For production deployment:

Use PostgreSQL instead of SQLite
Set up Redis for distributed rate limiting
Configure proper CORS origins
Set up monitoring and logging
Use environment-specific configurations
Enable HTTPS
Set up backup strategies for metrics data

Limitations

Rate limiting is in-memory (use Redis for distributed systems)
SQLite database (upgrade to PostgreSQL for production)
Basic alerting (extend for email/Slack notifications)

Future Enhancements

Email/Slack alert notifications
Advanced analytics and reporting
Multi-tenant support
Custom dashboards
Export metrics to external systems
Machine learning-based anomaly detection
A/B testing framework
Performance benchmarking

Tech Stack

FastAPI: Modern async web framework
LangChain: Agent framework integration
LangSmith: Tracing and observability
SQLite/PostgreSQL: Metrics storage
Redis: Rate limiting and caching
Python 3.8+: Core language

Use Cases

Production Monitoring: Real-time monitoring of agent systems
Cost Management: Track and optimize API costs
Performance Analytics: Identify bottlenecks and optimize
Alert Management: Proactive issue detection
Observability: Comprehensive system visibility

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
backend		backend
data/metrics		data/metrics
frontend/templates		frontend/templates
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SECURITY.md		SECURITY.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

josephsenior/agent-observability-platform

Folders and files

Latest commit

History

Repository files navigation

Agent Observability Platform

What This Platform Does

Key Features

Observability

Cost Management

Alerting

Production Features

Installation

Prerequisites

Setup

Usage

Starting the Server

Using the Web Dashboard

API Endpoints

Execute Agent Request

Get Metrics Summary

Get Recent Metrics

Get Daily Cost

Get Active Alerts

Resolve Alert

Health Check

Project Structure

Configuration

LangSmith Integration

Alerting

Cost Tracking

Production Deployment

Limitations

Future Enhancements

Tech Stack

Use Cases

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages