Skip to content

⚙️ Production MLOps Pipeline. Designed and deployed a fully automated CI/CD/CT pipeline on AWS/GCP (using Docker/Kubernetes) for a domain-specific LLM.

Notifications You must be signed in to change notification settings

aayostem/driftguard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enterprise-Grade MLOps Pipeline: Model Drift Detection System

mlops-platform/ ├── 📁 infrastructure/ │ ├── 📁 terraform/ │ │ ├── 📄 main.tf # Primary infrastructure configuration │ │ ├── 📄 variables.tf # Terraform variables │ │ ├── 📄 outputs.tf # Terraform outputs │ │ └── 📄 providers.tf # Cloud provider configurations │ ├── 📁 kubernetes/ │ │ ├── 📄 cluster-config.yaml # K8s cluster configuration │ │ ├── 📄 networking.yaml # Network policies & service mesh │ │ └── 📄 storage.yaml # Persistent volume claims │ └── 📁 helm-charts/ │ ├── 📁 mlops-monitoring/ # Custom monitoring stack │ ├── 📁 mlops-pipelines/ # ML pipeline components │ └── 📁 mlops-remediation/ # Auto-remediation services ├── 📁 src/ │ ├── 📁 api/ │ │ ├── 📄 app.py # FastAPI main application │ │ ├── 📄 routes/ │ │ │ ├── 📄 monitoring.py # Monitoring endpoints │ │ │ ├── 📄 remediation.py # Remediation endpoints │ │ │ └── 📄 analysis.py # RCA endpoints │ │ └── 📄 middleware/ │ │ ├── 📄 auth.py # Authentication middleware │ │ └── 📄 logging.py # Request logging │ ├── 📁 core/ │ │ ├── 📄 config.py # Application configuration │ │ ├── 📄 database.py # Database connections │ │ └── 📄 security.py # Security utilities │ ├── 📁 services/ │ │ ├── 📁 monitoring/ │ │ │ ├── 📄 anomaly_detector.py # Multivariate anomaly detection │ │ │ ├── 📄 drift_detector.py # Data & concept drift detection │ │ │ └── 📄 performance_tracker.py # Model performance tracking │ │ ├── 📁 remediation/ │ │ │ ├── 📄 engine.py # Remediation execution engine │ │ │ ├── 📄 actions.py # Available remediation actions │ │ │ └── 📄 safety_controller.py # Safety checks & rollback │ │ ├── 📁 analysis/ │ │ │ ├── 📄 rca_engine.py # Root cause analysis engine │ │ │ ├── 📄 correlation_analyzer.py # Signal correlation │ │ │ └── 📄 pattern_detector.py # Temporal pattern detection │ │ └── 📁 alerting/ │ │ ├── 📄 manager.py # Alert management & routing │ │ ├── 📄 notifier.py # Multi-channel notifications │ │ └── 📄 prioritizer.py # Intelligent priority assignment │ ├── 📁 models/ │ │ ├── 📄 schemas.py # Pydantic models for data validation │ │ └── 📄 database_models.py # SQLAlchemy database models │ ├── 📁 utils/ │ │ ├── 📄 logging_utils.py # Logging configuration │ │ ├── 📄 metrics_utils.py # Metrics collection utilities │ │ └── 📄 data_processor.py # Data processing helpers │ └── 📁 ml/ │ ├── 📁 pipelines/ │ │ ├── 📄 training_pipeline.py # End-to-end training pipeline │ │ ├── 📄 inference_pipeline.py # Real-time inference pipeline │ │ └── 📄 preprocessing.py # Data preprocessing │ ├── 📁 feature_store/ │ │ ├── 📄 manager.py # Feature store management │ │ └── 📄 online_store.py # Online feature serving │ └── 📁 model_registry/ │ ├── 📄 manager.py # Model version management │ └── 📄 deployment.py # Model deployment orchestration ├── 📁 tests/ │ ├── 📁 unit/ │ │ ├── 📄 test_monitoring.py # Monitoring service tests │ │ ├── 📄 test_remediation.py # Remediation engine tests │ │ └── 📄 test_analysis.py # RCA engine tests │ ├── 📁 integration/ │ │ ├── 📄 test_api_endpoints.py # API integration tests │ │ └── 📄 test_workflows.py # End-to-end workflow tests │ ├── 📁 performance/ │ │ └── 📄 load_testing.py # Performance & load testing │ └── 📁 fixtures/ │ └── 📄 test_data.py # Test data generators ├── 📁 deployments/ │ ├── 📄 Dockerfile # Application Dockerfile │ ├── 📄 docker-compose.yml # Local development setup │ ├── 📄 kubernetes.yaml # K8s deployment manifests │ └── 📄 helm-values.yaml # Helm configuration values ├── 📁 docs/ │ ├── 📄 architecture.md # System architecture │ ├── 📄 api_reference.md # API documentation │ ├── 📄 deployment_guide.md # Deployment instructions │ └── 📄 troubleshooting.md # Troubleshooting guide ├── 📁 scripts/ │ ├── 📄 setup_environment.sh # Environment setup script │ ├── 📄 deploy_cluster.sh # Cluster deployment script │ ├── 📄 run_migrations.sh # Database migration script │ └── 📄 backup_restore.sh # Backup & restore utilities ├── 📁 monitoring/ │ ├── 📄 prometheus.yml # Prometheus configuration │ ├── 📄 alert_rules.yml # Alerting rules │ ├── 📄 grafana_dashboards/ # Grafana dashboard JSON files │ └── 📄 custom_metrics/ # Custom metric definitions ├── 📁 database/ │ ├── 📄 migrations/ # Database migration scripts │ ├── 📄 schemas/ # Database schema definitions │ └── 📄 seeds/ # Initial data seeds ├── 📄 requirements.txt # Python dependencies ├── 📄 requirements-dev.txt # Development dependencies ├── 📄 Makefile # Build & deployment commands ├── 📄 .env.example # Environment variables template ├── 📄 .gitignore # Git ignore rules ├── 📄 README.md # Project overview & setup ├── 📄 LICENSE # Project license └── 📄 pyproject.toml # Python project configuration

About

⚙️ Production MLOps Pipeline. Designed and deployed a fully automated CI/CD/CT pipeline on AWS/GCP (using Docker/Kubernetes) for a domain-specific LLM.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published