Skip to content

This repository contains Databricks notebooks and pipeline configurations that implement a medallion architecture for structured, scalable, and governed data processing. The project moves data from the Bronze Layer to the Silver Layer using notebooks, and then from Silver to Gold using Declarative Pipelines (Delta Live Tables).

Notifications You must be signed in to change notification settings

yashbrid03/Databricks_Repo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Databricks Data Engineering Project – Bronze → Silver → Gold

This repository contains Databricks notebooks and pipeline configurations that implement a medallion architecture for structured, scalable, and governed data processing. The project moves data from the Bronze Layer to the Silver Layer using notebooks, and then from Silver to Gold using Declarative Pipelines (Delta Live Tables).

🚀 Project Overview

The goal of this project is to establish a standardized data engineering workflow using Databricks, Unity Catalog, and Delta Lake. Key components include:

  • External data access from Azure Data Lake using Access Connector
  • Transformation notebooks for Bronze → Silver
  • Declarative pipelines using Delta Live Tables (DLT) for Silver → Gold
  • Governance and lineage tracking using Unity Catalog

🛠️ Key Features

1️⃣ Access Connector & External Data Path

  • Used Azure Databricks Access Connector to securely access the Bronze Layer stored in Azure Data Lake Storage Gen2.
  • Configured External Locations and External Volumes in Unity Catalog for:
    • Simplified data access
    • Fine-grained access control
    • Managed governance

2️⃣ Unity Catalog Integration

  • Project is fully governed under Unity Catalog, providing:
    • Centralized permissions
    • Audit & lineage tracking
    • Managed catalogs, schemas, and tables -Ensures secure and controlled access across the Bronze, Silver, and Gold layers.

3️⃣ Notebook for Bronze → Silver Transformations

Created Databricks notebooks to:

  • Read raw Parquet data from Bronze
  • Apply cleansing & transformations:
    • Type casting
    • Deduplication
    • Normalization
    • Schema enforcement
  • Write curated output to the Silver Layer as Delta Tables

4️⃣ Declarative Pipeline for Silver → Gold (DLT)

Implemented Delta Live Tables (DLT) pipeline to build Gold layer data models. Pipeline features:

  • Event-driven or scheduled refresh
  • Automatic lineage visualization
  • Built-in quality expectations
  • Data stored as Managed Delta Tables in the Gold Layer

About

This repository contains Databricks notebooks and pipeline configurations that implement a medallion architecture for structured, scalable, and governed data processing. The project moves data from the Bronze Layer to the Silver Layer using notebooks, and then from Silver to Gold using Declarative Pipelines (Delta Live Tables).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published