- πΌ Data Engineer at Celebal Technologies | Building enterprise data platforms
- π Specialized in Databricks, Azure Data Factory, Delta Lake β Handling billion-scale workloads
- π Currently building AI-powered code migration solutions with MCP servers and GenAI
- π Databricks Certified Data Engineer Professional | GenAI Engineer Certified
- π― Seeking to collaborate on innovative data engineering and AI automation projects
- π« Connect: manuddn15@gmail.com | Portfolio
I architect and optimize enterprise-grade data platforms that process massive data volumes reliably and efficiently. My focus is on building scalable solutions that solve complex data engineering challenges at scale.
Core Expertise:
- ποΈ Designing metadata-driven data pipelines and governance frameworks
- β‘ Optimizing large-scale data operations and reducing pipeline latency
- π€ Building AI-powered automation tools for data engineering workflows
- π Implementing enterprise data governance, security, and observability
Led end-to-end design and optimization of enterprise data platforms handling massive data volumes from multiple sources.
Key Deliverables:
- β‘ Optimized pipeline runtime from 2 hours β 4 minutes (97% reduction) through advanced ADF parallel execution patterns
- π Engineered data ingestion framework reducing Sentinel extraction from 3 weeks β 2 days with zero data loss
- π Implemented comprehensive data governance with Microsoft Purview, RLS/CLS, and data lineage tracking
- π Identified critical data duplication bug affecting data quality β escalated to Microsoft product team
- π° Designed cost-optimized cluster configurations reducing infrastructure spend while maintaining SLA
Technical Stack: Databricks, Azure Data Factory, PySpark, Delta Lake, Microsoft Sentinel, Microsoft Purview, Apache Spark
Business Impact: Reduced operational costs, improved data quality to 100% accuracy, enabled enterprise-scale analytics
Architected comprehensive governance solutions enabling secure, traceable, and observable data flows across enterprise.
Key Deliverables:
- π Custom data lineage implementation with Atlas APIs and Microsoft Purview integration
- ποΈ Built observability dashboards with table-level monitoring for Lakehouse environments
- π Designed Delta Sharing framework for secure external data exchange without direct access
- π§ Engineered metadata-driven ingestion pipelines enabling dynamic, scalable configurations
Technical Stack: Microsoft Purview, Delta Lake, Databricks, Atlas APIs, Delta Live Tables
Building intelligent automation platform to accelerate cloud migration by converting legacy ETL code to modern cloud-native pipelines.
Current Work:
- π€ Architecting MCP server-based framework for AI-driven code transformation
- π Automating migration from QLIK to Databricks PySpark reducing manual effort by 80%+
- π Enabling faster cloud adoption and modernization for enterprise clients
Technical Stack: MCP Servers, Generative AI, PySpark, Databricks, Python
Data Platforms: Databricks β’ Azure Data Factory β’ Delta Lake β’ Microsoft Fabric β’ Apache Spark
Languages & Frameworks: Python β’ PySpark β’ SQL β’ Java
Cloud Architecture: Microsoft Azure (Data Factory, Databricks, Sentinel, Purview, ADLS, Fabric)
Specializations: Medallion Architecture β’ Metadata-Driven Pipelines β’ Data Governance β’ ETL/ELT Design β’ Cost Optimization β’ AI Automation
Tools & DevOps: Git β’ Docker β’ Databricks Asset Bundles β’ Power BI
- π₯ Databricks Certified Data Engineer Professional β Advanced certification demonstrating enterprise expertise
- π€ Databricks Certified Generative AI Engineer Associate β Expertise in AI/ML automation
- β 5-Star Performance Rating β Consistent excellence across all project deliverables
- π Identified Microsoft Product Bug β Discovered critical data issue, confirmed by Microsoft engineering team
- π₯ Mentor β Guiding next generation of data engineers within organization
| Impact | Scale |
|---|---|
| Records Managed | 10+ Billion |
| Largest Single Table | 5.8 Billion records |
| Pipeline Speed Improvement | 97% reduction |
| Data Ingestion Speed | 90% faster (3 weeks β 2 days) |
| Data Sources Integrated | 8+ enterprise platforms |
| Enterprise Clients | ADNOC, ONEOK, Hayya, QE, Sidra |



