A real-time stream processing application that analyzes user login events using Apache Kafka, with metrics monitoring through Prometheus and Grafana.
This project implements a streaming data pipeline that:
- Processes user login events in real-time
- Performs risk assessment and analysis
- Provides metrics monitoring and visualization
- Supports geographic and temporal analysis
- Includes comprehensive testing and monitoring
- Kafka Broker: Message broker for handling streaming data
- Zookeeper: Coordination service for Kafka
- Python Producer: Generates simulated login events
- Python Consumer: Processes and analyzes login events
- Kafdrop: Web UI for monitoring Kafka topics
- Prometheus: Metrics collection and storage
- Grafana: Metrics visualization and dashboards
- Producer generates login events → user-login topic
- Consumer processes events and performs analysis
- Processed data → processed-logins topic
- Metrics exposed via Prometheus endpoint
- Grafana visualizes metrics and provides dashboards
- Python 3.9 or later
- Docker and Docker Compose
- Make utility
- Clone the repository:
- git clone https://github.com/PrateekKumar1709/kafka-stream-processor.git
- cd kafka-stream-processor
- Initialize the project:
- make init
- Start all services:
- make start
- make init: Initialize project and install dependencies
- make build: Build Docker images
- make run: Start all services
- make stop: Stop all services
- make clean: Clean up all artifacts and containers
- make test: Run unit and integration tests
- make lint: Run code linting
- make logs: View application logs
- make health: Check system health
- make monitor: Open Kafdrop UI
- Kafdrop UI: http://localhost:9000
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (User/Pass: admin/admin)
- Message processing rate
- Processing time distribution
- Risk score distribution
- Error rates by type
- System health metrics
- Run all tests with coverage
- make test
- Run linting
- make lint
- Unit tests for individual components
- Integration tests for Kafka interactions
- Mocked external dependencies
- Connection refused errors: Ensure all services are running (docker-compose ps)
- Check logs (make logs)
- Missing metrics:
- Verify Prometheus target status
- Check consumer metrics endpoint
- Topic creation issues:
- Use Kafdrop UI to verify topic existence
- Check Kafka broker logs
- Use Kubernetes for container orchestration instead of Docker Compose
- Implement CI/CD pipeline using GitHub Actions/Jenkins with:
- Automated testing
- Security scanning
- Image versioning
- Deployment stages (dev, staging, prod)
- Use proper secret management (HashiCorp Vault/AWS Secrets Manager)
- Configure proper networking and security groups
- Implement proper logging aggregation (ELK Stack/Splunk)
- Security:
- SSL/TLS encryption for Kafka
- Authentication/Authorization (SASL)
- API Gateway for external access
- Network segmentation
- Monitoring & Alerting:
- Alert manager for Prometheus
- PagerDuty integration
- SLO/SLA monitoring
- Enhanced logging with correlation IDs
- Data Management:
- Schema Registry for message validation
- Dead Letter Queue for failed messages
- Data backup and recovery solution
- Data retention policies
- High Availability:
- Multi-AZ deployment
- Load balancers
- Service mesh (Istio)
- Circuit breakers
- Kafka Scaling:
- Increase partition count for topics
- Add more Kafka brokers
- Implement consumer groups for parallel processing
- Consumer Scaling:
- Horizontal scaling of consumer instances
- Implement back-pressure handling
- Add caching layer (Redis)
- Optimize batch processing
- Infrastructure Scaling:
- Auto-scaling groups
- Resource quotas and limits
- Performance monitoring and tuning
- Database sharding if needed
- Monitoring Scaling:
- Distributed tracing (Jaeger/Zipkin)
- Metrics aggregation
- Log rotation and archival
- Capacity planning