This repository contains Databricks notebooks and pipeline configurations that implement a medallion architecture for structured, scalable, and governed data processing. The project moves data from the Bronze Layer to the Silver Layer using notebooks, and then from Silver to Gold using Declarative Pipelines (Delta Live Tables).
The goal of this project is to establish a standardized data engineering workflow using Databricks, Unity Catalog, and Delta Lake. Key components include:
- External data access from Azure Data Lake using Access Connector
- Transformation notebooks for Bronze → Silver
- Declarative pipelines using Delta Live Tables (DLT) for Silver → Gold
- Governance and lineage tracking using Unity Catalog
- Used Azure Databricks Access Connector to securely access the Bronze Layer stored in Azure Data Lake Storage Gen2.
- Configured External Locations and External Volumes in Unity Catalog for:
- Simplified data access
- Fine-grained access control
- Managed governance
- Project is fully governed under Unity Catalog, providing:
- Centralized permissions
- Audit & lineage tracking
- Managed catalogs, schemas, and tables -Ensures secure and controlled access across the Bronze, Silver, and Gold layers.
Created Databricks notebooks to:
- Read raw Parquet data from Bronze
- Apply cleansing & transformations:
- Type casting
- Deduplication
- Normalization
- Schema enforcement
- Write curated output to the Silver Layer as Delta Tables
Implemented Delta Live Tables (DLT) pipeline to build Gold layer data models. Pipeline features:
- Event-driven or scheduled refresh
- Automatic lineage visualization
- Built-in quality expectations
- Data stored as Managed Delta Tables in the Gold Layer