Skip to content

ldkhang1201/monitoring-system

Repository files navigation

Monitoring Tool - Distributed System Monitoring

A modular monitoring system with plugin architecture that collects metrics from multiple agents, forwards them through a gRPC server to Kafka, and enables real-time analysis. Configuration is managed dynamically via etcd.

🎯 Quick Start

1. Start Services (Kafka, Zookeeper, etcd)

docker compose up -d

2. Install Dependencies

pip install -r requirements.txt

Note: If you encounter protobuf compatibility issues with etcd3, set this environment variable:

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
# Add to ~/.zshrc or ~/.bashrc for persistence

3. Set Up etcd Configuration (Optional)

# Option 1: Manually set up configuration for an agent
python setup_etcd_config.py --hostname agent-001 --interval 5

# Option 2: Let the agent auto-initialize with defaults
# The agent will automatically store default configuration to etcd if none exists

4. Run the System

# Terminal 1: Start gRPC Server
python3 run_server.py
# Or with environment variables:
# export GRPC_SERVER_PORT=50051
# export KAFKA_BOOTSTRAP_SERVERS=localhost:9092
# python3 run_server.py

# Terminal 2: Start Analysis App
python3 run_analysis.py get-metrics
# Or with environment variables:
# export KAFKA_BOOTSTRAP_SERVERS=localhost:9092
# python3 run_analysis.py get-metrics

# Terminal 3: Start Agent
python3 run_agent.py --hostname agent-001 --etcd-host localhost --etcd-port 2379
# Or with environment variables:
# export GRPC_SERVER_HOST=localhost
# export GRPC_SERVER_PORT=50051
# export ETCD_HOST=localhost
# export ETCD_PORT=2379
# python3 run_agent.py --hostname agent-001
# Note: Agent will auto-create default config in etcd if none exists

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Monitor Agent   │─── gRPC ─────►│  gRPC Server  │──── Kafka ────►│  Analysis App   β”‚
β”‚                 β”‚   (Stream)     β”‚  (Broker)      β”‚                β”‚                 β”‚
β”‚ β€’ Collects data β”‚                β”‚ β€’ Forwards data β”‚                β”‚ β€’ Analyzes data β”‚
β”‚ β€’ Plugin system β”‚                β”‚                 β”‚                β”‚ β€’ Prints to stdoutβ”‚
β”‚ β€’ etcd config   β”‚                β”‚                 β”‚                β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      etcd       β”‚
β”‚  Configuration  β”‚
β”‚  Management     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features:

  • Plugin Architecture: Extensible plugin system for data processing
  • Dynamic Configuration: Real-time configuration updates via etcd
  • Unidirectional Streaming: Agent sends metrics to server via gRPC

πŸ“¦ Module Structure

lab_ds/
β”œβ”€β”€ agent/                    # Monitoring agent module
β”‚   β”œβ”€β”€ agent.py              # Main agent orchestrator
β”‚   β”œβ”€β”€ collect.py            # Metric collection module
β”‚   β”œβ”€β”€ grpc.py               # gRPC communication module
β”‚   β”œβ”€β”€ etcd_config.py        # etcd configuration manager
β”‚   β”œβ”€β”€ plugin_manager.py     # Plugin loading and management
β”‚   └── plugins/              # Plugin implementations
β”‚       β”œβ”€β”€ base.py           # Base plugin class
β”‚       └── deduplication.py  # Example deduplication plugin
β”œβ”€β”€ grpc_server/              # gRPC server + Kafka producer
β”‚   β”œβ”€β”€ server.py             # gRPC server implementation
β”‚   └── kafka_producer.py     # Kafka producer service
β”œβ”€β”€ analysis_app/             # Kafka consumer + analysis
β”‚   └── consumer.py           # Analysis application
β”œβ”€β”€ shared/                   # Protocol definitions & config
β”‚   β”œβ”€β”€ monitoring.proto      # gRPC protocol definition
β”‚   β”œβ”€β”€ config.py             # Kafka topics configuration
β”‚   └── monitoring_pb2*.py    # Generated protobuf files
β”œβ”€β”€ setup_etcd_config.py      # Helper script for etcd config
β”œβ”€β”€ run_agent.py              # ⭐ Run agent
β”œβ”€β”€ run_server.py             # ⭐ Run server
└── run_analysis.py           # ⭐ Run analysis app

πŸš€ Usage

Agent Options

python3 run_agent.py \
    --hostname agent-001 \
    --server localhost:50051 \
    --etcd-host localhost \
    --etcd-port 2379 \
    --config-key /monitor/config/agent-001  # Optional, defaults to /monitor/config/<hostname>

Or use environment variables:

export GRPC_SERVER_HOST=localhost
export GRPC_SERVER_PORT=50051
export ETCD_HOST=localhost
export ETCD_PORT=2379
python3 run_agent.py --hostname agent-001

Note: If no configuration exists in etcd for the agent, it will automatically initialize with default values and store them to etcd.

Server Options

python3 run_server.py

Or use environment variables:

export GRPC_SERVER_PORT=50051
export KAFKA_BOOTSTRAP_SERVERS=localhost:9092
python3 run_server.py

Analysis App Options

python3 run_analysis.py get-metrics \
    --hostname <hostname>
    --timeout 10

Or use environment variables:

export KAFKA_BOOTSTRAP_SERVERS=localhost:9092
python3 run_analysis.py get-metrics --group-id my-team --timeout 10

Setting up etcd Configuration

Option 1: Manual Setup (Recommended for production)

python setup_etcd_config.py \
    --hostname <hostname> \
    --interval 5 \
    --metrics cpu memory "disk read" "disk write" "net in" "net out" \
    --plugins agent.plugins.deduplication.DeduplicationPlugin

Option 2: Auto-initialization (Agent creates defaults automatically)

# Just start the agent - it will create default config in etcd if none exists
python3 run_agent.py --hostname agent-001

Or use environment variables:

export ETCD_HOST=localhost
export ETCD_PORT=2379
python setup_etcd_config.py --hostname agent-001 --interval 5

πŸ”Œ Plugin Architecture

The agent supports a plugin architecture for extensible data processing. Plugins can:

  • Filter metrics
  • Transform data
  • Drop duplicate data (deduplication plugin example)
  • Add custom processing logic

Creating a Plugin

  1. Create a new plugin class in agent/plugins/:
from agent.plugins.base import BasePlugin
from shared import monitoring_pb2

class MyPlugin(BasePlugin):
    def initialize(self, config=None):
        # Initialize plugin
        pass
    
    def run(self, metrics_request):
        # Process metrics_request
        # Return modified request or None to drop
        return metrics_request
    
    def finalize(self):
        # Cleanup
        pass
  1. Add plugin to etcd configuration:
{
    "interval": 5,
    "metrics": ["cpu", "memory"],
    "plugins": ["agent.plugins.my_plugin.MyPlugin"]
}

βš™οΈ Configuration

etcd Configuration Format

Configuration is stored in etcd at /monitor/config/<hostname>:

{
    "interval": 5,
    "metrics": [
        "cpu",
        "memory",
        "disk read",
        "disk write",
        "net in",
        "net out"
    ],
    "plugins": [
        "agent.plugins.deduplication.DeduplicationPlugin"
    ]
}

Automatic Configuration Initialization

When an agent starts, the EtcdConfigManager automatically:

  1. Checks for existing configuration in etcd at /monitor/config/<hostname>
  2. If config exists: Loads and uses it
  3. If config doesn't exist:
    • Uses default configuration values
    • Automatically stores the default config to etcd (so it's visible and can be modified later)

This means:

  • βœ… You can start agents without pre-configuring etcd
  • βœ… Default configs are automatically persisted to etcd
  • βœ… You can modify the auto-created config later via setup_etcd_config.py or etcdctl
  • βœ… The agent's store_config() method can be used programmatically to update config

Default Configuration Values:

  • interval: 5 seconds
  • metrics: ["cpu", "memory", "disk read", "disk write", "net in", "net out"]
  • plugins: [] (empty by default)
  • thresholds: Predefined thresholds for alerts
  • min_cpu: 5.0%
  • min_memory: 5.0%
  • window_size: 5

Dynamic Configuration Updates

Configuration changes in etcd are automatically detected and applied:

  • Interval: Updated in real-time
  • Metrics: Updated immediately
  • Plugins: Reloaded dynamically

Setting Configuration

Automatic Configuration Initialization: When an agent starts, it automatically checks for configuration in etcd. If no configuration exists, the agent will:

  • Use default configuration values
  • Automatically store the default configuration to etcd (so it's visible and can be modified later)

This means you can start an agent without pre-configuring etcd, and it will initialize itself with sensible defaults.

Manual Configuration Setup:

# Using the setup script
python setup_etcd_config.py --hostname agent-001 --interval 10

# Or directly with etcdctl (if etcd is running in Docker)
docker exec -it etcd etcdctl put /monitor/config/agent-001 '{"interval": 10, "metrics": ["cpu", "memory"], "plugins": []}'

Note: The agent's EtcdConfigManager now includes a store_config() method that can be used to programmatically store configuration to etcd.

πŸ“Š Data Models

System Metrics

  • CPU usage (%)
  • Memory usage (%)
  • Memory used/total (MB)
  • Disk read/write (MB/s)
  • Network in/out (MB/s)

Kafka Topics

  • monitoring-data - Agent metrics β†’ Analysis app (via gRPC server)

πŸ”§ Requirements

System Requirements

  • Python 3.7+
  • Docker (for Kafka, Zookeeper, and etcd)
  • psutil (for real metrics mode)

Services

  • Kafka: localhost:9092
  • Kafka UI: http://localhost:8080
  • etcd: localhost:2379

Environment Variables

Protobuf Compatibility

For protobuf compatibility with etcd3:

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

Configuration via Environment Variables

All ports and hosts can be configured via environment variables, making it easy to deploy in different environments:

gRPC Server Configuration:

  • GRPC_SERVER_PORT - Port for the gRPC server (default: 50051)
  • GRPC_SERVER_HOST - Host for the gRPC server (default: localhost)

Kafka Configuration:

  • KAFKA_BOOTSTRAP_SERVERS - Kafka bootstrap servers address (default: localhost:9092)

etcd Configuration:

  • ETCD_HOST - etcd server hostname (default: localhost)
  • ETCD_PORT - etcd server port (default: 2379)

Example Usage:

# Set environment variables
export GRPC_SERVER_PORT=50051
export GRPC_SERVER_HOST=localhost
export KAFKA_BOOTSTRAP_SERVERS=localhost:9092
export ETCD_HOST=localhost
export ETCD_PORT=2379

# Run services (they will use the environment variables)
python3 run_server.py
python3 run_agent.py --hostname agent-001
python3 run_analysis.py get-metrics

Note: Command-line arguments will override environment variables if both are provided. Environment variables provide defaults when command-line arguments are not specified.

πŸŽ“ Examples

Basic Usage

# Terminal 1: Start gRPC Server
python3 run_server.py

# Terminal 2: Start Analysis App
python3 run_analysis.py get-metrics

# Terminal 3: Start agent (config will be auto-created if missing)
python3 run_agent.py --hostname agent-001
# Or manually set up config first:
# python setup_etcd_config.py --hostname agent-001
# python3 run_agent.py --hostname agent-001

Multiple Agents

# Option 1: Let agents auto-initialize with defaults
python3 run_agent.py --hostname agent-001 &
python3 run_agent.py --hostname agent-002 &
python3 run_agent.py --hostname agent-003 &

# Option 2: Manually set up configurations first
python setup_etcd_config.py --hostname agent-001 --interval 5
python setup_etcd_config.py --hostname agent-002 --interval 10
python setup_etcd_config.py --hostname agent-003 --interval 15
python3 run_agent.py --hostname agent-001 &
python3 run_agent.py --hostname agent-002 &
python3 run_agent.py --hostname agent-003 &

Dynamic Configuration Update

# Update configuration while agent is running
python setup_etcd_config.py --hostname agent-001 --interval 10 --metrics cpu memory

# Agent will automatically detect and apply the change

Custom Plugin Example

# Add custom plugin to configuration
python setup_etcd_config.py \
    --hostname agent-001 \
    --plugins agent.plugins.deduplication.DeduplicationPlugin agent.plugins.my_plugin.MyPlugin

πŸ” Monitoring

View Kafka Messages

open http://localhost:8080

Check etcd Configuration

# Using etcdctl (if etcd is in Docker)
docker exec -it etcd etcdctl get /monitor/config/agent-001

# Or watch for changes
docker exec -it etcd etcdctl watch /monitor/config/agent-001

Agent Logs

The agent logs show:

  • Configuration loading from etcd
  • Plugin loading
  • Configuration updates
  • Metrics collection

πŸ› οΈ Development

Generate gRPC Code

To generate the Python protobuf files from the .proto definition, use the provided script:

./generate_protobuf.sh

Or manually run:

python3 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. shared/monitoring.proto

This will generate:

  • shared/monitoring_pb2.py - Message classes
  • shared/monitoring_pb2_grpc.py - gRPC service classes

Project Structure

  • agent/: Modular agent with collect, grpc, and plugins modules
  • grpc_server/: gRPC server that forwards metrics to Kafka
  • analysis_app/: Kafka consumer that displays metrics
  • shared/: Protocol definitions and shared configuration

Adding New Plugins

  1. Create plugin class extending BasePlugin
  2. Implement initialize(), run(), and finalize() methods
  3. Add plugin path to etcd configuration
  4. Plugin will be loaded automatically

πŸ› Troubleshooting

etcd Connection Issues

  • Ensure etcd is running: docker ps | grep etcd
  • Check etcd port: localhost:2379
  • Verify network connectivity

Protobuf Compatibility

If you see protobuf errors with etcd3:

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

Configuration Not Updating

  • Verify etcd watch is active (check agent logs)
  • Ensure configuration key matches hostname (default: /monitor/config/<hostname>)
  • Check etcd connection

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •