Arjun Arjun-M-101

Hello I'm Arjun

Aspiring Data Engineer

🙋‍♂️ About Me

👨‍💻 I’m currently working as a Database Administrator, building strong foundations in data management and reliability
🌱 Transitioning into Data Engineering by designing end‑to‑end batch and streaming pipelines
🛠️ Passionate about building scalable, reliable data pipelines that turn raw data into actionable insights
👯 Open to collaborating on Data Engineering & Open Source projects
📫 Reach me at arjunmpec101@gmail.com
⚡ Fun fact: I debug pipelines the way I play games — with persistence and strategy

🛠️ Tech Stack

🔹 Languages

🔹 Data Engineering & Analytics

🔹 Databases

🔹 DevOps & Cloud

🔹 Web Basics

📂 Featured Projects

🧱 Retail Sales SQL Data Warehouse End‑to‑end SQL data warehouse implementing a Bronze → Silver → Gold layered architecture for retail sales.
- Built entirely in SQL Server/MySQL (no external ETL tool)
- Bronze layer mirrors raw CRM & ERP source tables (customers, products, sales, locations)
- Silver layer applies data quality checks (ID normalization, date validation, gender/marital‑status standardization)
- Gold layer models a star schema with fact_sales, dim_customers, and dim_products using surrogate keys
- Uses window functions (ROW_NUMBER) and joins to integrate history, resolve conflicts, and conform dimensions
- Produces analytics‑ready views/tables suitable for BI tools and downstream reporting
🗄️ YouTube Data Engineering Pipeline (Batch Processing)
End‑to‑end batch ETL pipeline implementing the Medallion Architecture (Bronze → Silver → Gold).
- Orchestrated with Apache Airflow (3.x)
- Transformations with Apache Spark
- Data lake layers on local filesystem (Bronze/Silver/Gold)
- Serving layer in Postgres (analytics‑ready tables)
- Interactive Streamlit + Altair dashboard via SQLAlchemy
- Ingests raw YouTube trending data (CSV/JSON), cleans, enriches, and computes derived metrics for BI
📊 StockPulse (Streaming Pipeline)
Real‑time streaming pipeline simulating stock ticks and processing them end‑to‑end.
- Ingestion via Kafka producer publishing to stock_ticks topic
- Processing with Spark Structured Streaming (schema enforcement + derived metrics)
- Dual sinks: Postgres (serving layer) + Parquet (partitioned by index/date)
- Interactive Streamlit + Altair dashboard for real‑time visualization
- Fully orchestrated with Apache Airflow

📜 Certifications

📊 My GitHub Stats

Note: Top languages is only a metric of the languages my public code consists of and doesn't reflect experience or skill level.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arjun Arjun-M-101

Block or report Arjun-M-101

Hello I'm Arjun

Aspiring Data Engineer

🙋‍♂️ About Me

🛠️ Tech Stack

🔹 Languages

🔹 Data Engineering & Analytics

🔹 Databases

🔹 DevOps & Cloud

🔹 Web Basics

📂 Featured Projects

📜 Certifications

📊 My GitHub Stats

🌐 Connect with Me

❤ Views and Followers

Pinned Loading

Uh oh!