- 👨💻 I’m currently working as a Database Administrator, building strong foundations in data management and reliability
- 🌱 Transitioning into Data Engineering by designing end‑to‑end batch and streaming pipelines
- 🛠️ Passionate about building scalable, reliable data pipelines that turn raw data into actionable insights
- 👯 Open to collaborating on Data Engineering & Open Source projects
- 📫 Reach me at arjunmpec101@gmail.com
- ⚡ Fun fact: I debug pipelines the way I play games — with persistence and strategy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-
🧱 Retail Sales SQL Data Warehouse End‑to‑end SQL data warehouse implementing a Bronze → Silver → Gold layered architecture for retail sales.
- Built entirely in SQL Server/MySQL (no external ETL tool)
- Bronze layer mirrors raw CRM & ERP source tables (customers, products, sales, locations)
- Silver layer applies data quality checks (ID normalization, date validation, gender/marital‑status standardization)
- Gold layer models a star schema with fact_sales, dim_customers, and dim_products using surrogate keys
- Uses window functions (ROW_NUMBER) and joins to integrate history, resolve conflicts, and conform dimensions
- Produces analytics‑ready views/tables suitable for BI tools and downstream reporting
-
🗄️ YouTube Data Engineering Pipeline (Batch Processing)
End‑to‑end batch ETL pipeline implementing the Medallion Architecture (Bronze → Silver → Gold).- Orchestrated with Apache Airflow (3.x)
- Transformations with Apache Spark
- Data lake layers on local filesystem (Bronze/Silver/Gold)
- Serving layer in Postgres (analytics‑ready tables)
- Interactive Streamlit + Altair dashboard via SQLAlchemy
- Ingests raw YouTube trending data (CSV/JSON), cleans, enriches, and computes derived metrics for BI
-
📊 StockPulse (Streaming Pipeline)
Real‑time streaming pipeline simulating stock ticks and processing them end‑to‑end.- Ingestion via Kafka producer publishing to
stock_tickstopic - Processing with Spark Structured Streaming (schema enforcement + derived metrics)
- Dual sinks: Postgres (serving layer) + Parquet (partitioned by index/date)
- Interactive Streamlit + Altair dashboard for real‑time visualization
- Fully orchestrated with Apache Airflow
- Ingestion via Kafka producer publishing to
|
|
|
Note: Top languages is only a metric of the languages my public code consists of and doesn't reflect experience or skill level.
