Skip to content
View manuarya1610's full-sized avatar
:octocat:
Being the best version of myself
:octocat:
Being the best version of myself

Block or report manuarya1610

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
manuarya1610/README.md

alt text

Hi πŸ‘‹, I'm Manu Arya

Data Engineer | Enterprise Data Platforms | Databricks Certified Professional



manuarya1610

  • πŸ’Ό Data Engineer at Celebal Technologies | Building enterprise data platforms
  • πŸš€ Specialized in Databricks, Azure Data Factory, Delta Lake β€” Handling billion-scale workloads
  • πŸ”­ Currently building AI-powered code migration solutions with MCP servers and GenAI
  • πŸ† Databricks Certified Data Engineer Professional | GenAI Engineer Certified
  • πŸ‘― Seeking to collaborate on innovative data engineering and AI automation projects
  • πŸ“« Connect: manuddn15@gmail.com | Portfolio

πŸš€ Professional Overview

I architect and optimize enterprise-grade data platforms that process massive data volumes reliably and efficiently. My focus is on building scalable solutions that solve complex data engineering challenges at scale.

Core Expertise:

  • πŸ—οΈ Designing metadata-driven data pipelines and governance frameworks
  • ⚑ Optimizing large-scale data operations and reducing pipeline latency
  • πŸ€– Building AI-powered automation tools for data engineering workflows
  • πŸ” Implementing enterprise data governance, security, and observability

πŸ’‘ Key Projects & Impact

Enterprise Data Platform Architecture β€” 10+ Billion Records

Led end-to-end design and optimization of enterprise data platforms handling massive data volumes from multiple sources.

Key Deliverables:

  • ⚑ Optimized pipeline runtime from 2 hours β†’ 4 minutes (97% reduction) through advanced ADF parallel execution patterns
  • πŸš€ Engineered data ingestion framework reducing Sentinel extraction from 3 weeks β†’ 2 days with zero data loss
  • πŸ” Implemented comprehensive data governance with Microsoft Purview, RLS/CLS, and data lineage tracking
  • πŸ› Identified critical data duplication bug affecting data quality β€” escalated to Microsoft product team
  • πŸ’° Designed cost-optimized cluster configurations reducing infrastructure spend while maintaining SLA

Technical Stack: Databricks, Azure Data Factory, PySpark, Delta Lake, Microsoft Sentinel, Microsoft Purview, Apache Spark

Business Impact: Reduced operational costs, improved data quality to 100% accuracy, enabled enterprise-scale analytics


Data Governance & Observability Platform

Architected comprehensive governance solutions enabling secure, traceable, and observable data flows across enterprise.

Key Deliverables:

  • πŸ“Š Custom data lineage implementation with Atlas APIs and Microsoft Purview integration
  • πŸ‘οΈ Built observability dashboards with table-level monitoring for Lakehouse environments
  • πŸ”„ Designed Delta Sharing framework for secure external data exchange without direct access
  • πŸ”§ Engineered metadata-driven ingestion pipelines enabling dynamic, scalable configurations

Technical Stack: Microsoft Purview, Delta Lake, Databricks, Atlas APIs, Delta Live Tables


AI-Powered Code Migration Engine (Current)

Building intelligent automation platform to accelerate cloud migration by converting legacy ETL code to modern cloud-native pipelines.

Current Work:

  • πŸ€– Architecting MCP server-based framework for AI-driven code transformation
  • πŸ”„ Automating migration from QLIK to Databricks PySpark reducing manual effort by 80%+
  • πŸ“ˆ Enabling faster cloud adoption and modernization for enterprise clients

Technical Stack: MCP Servers, Generative AI, PySpark, Databricks, Python


🎯 Technical Expertise

Data Platforms: Databricks β€’ Azure Data Factory β€’ Delta Lake β€’ Microsoft Fabric β€’ Apache Spark

Languages & Frameworks: Python β€’ PySpark β€’ SQL β€’ Java

Cloud Architecture: Microsoft Azure (Data Factory, Databricks, Sentinel, Purview, ADLS, Fabric)

Specializations: Medallion Architecture β€’ Metadata-Driven Pipelines β€’ Data Governance β€’ ETL/ELT Design β€’ Cost Optimization β€’ AI Automation

Tools & DevOps: Git β€’ Docker β€’ Databricks Asset Bundles β€’ Power BI


πŸ† Professional Recognition

  • πŸ₯‡ Databricks Certified Data Engineer Professional β€” Advanced certification demonstrating enterprise expertise
  • πŸ€– Databricks Certified Generative AI Engineer Associate β€” Expertise in AI/ML automation
  • ⭐ 5-Star Performance Rating β€” Consistent excellence across all project deliverables
  • πŸ” Identified Microsoft Product Bug β€” Discovered critical data issue, confirmed by Microsoft engineering team
  • πŸ‘₯ Mentor β€” Guiding next generation of data engineers within organization

πŸ“ˆ By The Numbers

Impact Scale
Records Managed 10+ Billion
Largest Single Table 5.8 Billion records
Pipeline Speed Improvement 97% reduction
Data Ingestion Speed 90% faster (3 weeks β†’ 2 days)
Data Sources Integrated 8+ enterprise platforms
Enterprise Clients ADNOC, ONEOK, Hayya, QE, Sidra

Connect with me:

https://www.linkedin.com/in/manuaryaddn/ https://leetcode.com https://leetcode.com

Languages and Tools:

bootstrap c cplusplus css3 git html5 java javascript linux mysql opencv photoshop qt sass


@manuarya's Holopin board


Manu Arya stats


Pinned Loading

  1. Object-Detection-and-Alert-Generation Object-Detection-and-Alert-Generation Public

    This software detects various object in a frame and it has many other features.

    C++ 1

  2. Portfolio-Website Portfolio-Website Public

    CSS

  3. RTSP-Video-Player RTSP-Video-Player Public

    C++

  4. manuarya1610 manuarya1610 Public

  5. Mars-Weather Mars-Weather Public

    SCSS