Kafka Stream Processor

A real-time stream processing application that analyzes user login events using Apache Kafka, with metrics monitoring through Prometheus and Grafana.

Project Overview

This project implements a streaming data pipeline that:

Processes user login events in real-time
Performs risk assessment and analysis
Provides metrics monitoring and visualization
Supports geographic and temporal analysis
Includes comprehensive testing and monitoring

Architecture

Components

Kafka Broker: Message broker for handling streaming data
Zookeeper: Coordination service for Kafka
Python Producer: Generates simulated login events
Python Consumer: Processes and analyzes login events
Kafdrop: Web UI for monitoring Kafka topics
Prometheus: Metrics collection and storage
Grafana: Metrics visualization and dashboards

Data Flow

Producer generates login events → user-login topic
Consumer processes events and performs analysis
Processed data → processed-logins topic
Metrics exposed via Prometheus endpoint
Grafana visualizes metrics and provides dashboards

Prerequisites

Python 3.9 or later
Docker and Docker Compose
Make utility

Installation & Setup

Clone the repository:
- git clone https://github.com/PrateekKumar1709/kafka-stream-processor.git
- cd kafka-stream-processor
Initialize the project:
- make init
Start all services:
- make start

Available Make Commands

make init: Initialize project and install dependencies
make build: Build Docker images
make run: Start all services
make stop: Stop all services
make clean: Clean up all artifacts and containers
make test: Run unit and integration tests
make lint: Run code linting
make logs: View application logs
make health: Check system health
make monitor: Open Kafdrop UI

Monitoring & Management

Access Points

Kafdrop UI: http://localhost:9000
Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (User/Pass: admin/admin)

Available Metrics

Message processing rate
Processing time distribution
Risk score distribution
Error rates by type
System health metrics

Testing

Running Tests

Run all tests with coverage
- make test
Run linting
- make lint

Test Structure

Unit tests for individual components
Integration tests for Kafka interactions
Mocked external dependencies

Troubleshooting Common Issues

Connection refused errors: Ensure all services are running (docker-compose ps)
- Check logs (make logs)
Missing metrics:
- Verify Prometheus target status
- Check consumer metrics endpoint
Topic creation issues:
- Use Kafdrop UI to verify topic existence
- Check Kafka broker logs

Next Steps:

Production Deployment Strategy:

Use Kubernetes for container orchestration instead of Docker Compose
Implement CI/CD pipeline using GitHub Actions/Jenkins with:
- Automated testing
- Security scanning
- Image versioning
- Deployment stages (dev, staging, prod)
Use proper secret management (HashiCorp Vault/AWS Secrets Manager)
Configure proper networking and security groups
Implement proper logging aggregation (ELK Stack/Splunk)

Additional Production Components:

Security:
- SSL/TLS encryption for Kafka
- Authentication/Authorization (SASL)
- API Gateway for external access
- Network segmentation
Monitoring & Alerting:
- Alert manager for Prometheus
- PagerDuty integration
- SLO/SLA monitoring
- Enhanced logging with correlation IDs
Data Management:
- Schema Registry for message validation
- Dead Letter Queue for failed messages
- Data backup and recovery solution
- Data retention policies
High Availability:
- Multi-AZ deployment
- Load balancers
- Service mesh (Istio)
- Circuit breakers

Scaling Strategies:

Kafka Scaling:
- Increase partition count for topics
- Add more Kafka brokers
- Implement consumer groups for parallel processing
Consumer Scaling:
- Horizontal scaling of consumer instances
- Implement back-pressure handling
- Add caching layer (Redis)
- Optimize batch processing
Infrastructure Scaling:
- Auto-scaling groups
- Resource quotas and limits
- Performance monitoring and tuning
- Database sharding if needed
Monitoring Scaling:
- Distributed tracing (Jaeger/Zipkin)
- Metrics aggregation
- Log rotation and archival
- Capacity planning

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
configs		configs
docker/consumer		docker/consumer
scripts		scripts
src/consumer		src/consumer
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitignore copy		.gitignore copy
.pylintrc		.pylintrc
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka Stream Processor

Project Overview

Architecture

Components

Data Flow

Prerequisites

Installation & Setup

Available Make Commands

Monitoring & Management

Access Points

Available Metrics

Testing

Running Tests

Test Structure

Troubleshooting Common Issues

Next Steps:

Production Deployment Strategy:

Additional Production Components:

Scaling Strategies:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

PrateekKumar1709/kafka-stream-processor

Folders and files

Latest commit

History

Repository files navigation

Kafka Stream Processor

Project Overview

Architecture

Components

Data Flow

Prerequisites

Installation & Setup

Available Make Commands

Monitoring & Management

Access Points

Available Metrics

Testing

Running Tests

Test Structure

Troubleshooting Common Issues

Next Steps:

Production Deployment Strategy:

Additional Production Components:

Scaling Strategies:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages