Skip to content

Build a working, minimal data-reliability pipeline that catches breaking schema changes automatically and deploys new data versions safely using CI/CD.

Notifications You must be signed in to change notification settings

nivasharmaa/DataBridge

Repository files navigation

DataBridge

DataBridge is a fully containerized data engineering pipeline that validates raw CSV files using JSON Schema, loads them into a PostgreSQL warehouse, and builds analytics-ready models using dbt. The entire workflow runs end-to-end with one command via Docker Compose.

This project demonstrates a modern, production-grade pattern for data validation → ingestion → warehouse modeling.

Demo Video

(Coming soon...)

Features Implemented

  • JSON Schema–based validation for strong data contracts

  • Automated ingestion pipeline written in Python

  • Fully containerized execution using Docker

  • PostgreSQL warehouse automatically initialized each run

  • dbt staging models:

    • stg_transactions
    • stg_customers
  • dbt fact model joining customers + transactions:

    • fct_customer_transactions
  • Column-level tests (unique, not-null, accepted_values)

  • One-command workflow:

    docker compose up --build
    
  • Adminer UI for exploring warehouse tables in the browser

Tech Stack Used

Languages & Tools

  • Python
  • SQL

Database

  • PostgreSQL

Validation

  • JSON Schema

Modeling

  • dbt (data transformations)

Containerization

  • Docker + Docker Compose

Running the Pipeline

From the root of the project:

docker compose up --build

This will:

  1. Start PostgreSQL and Adminer
  2. Validate both datasets against JSON Schema
  3. Create the staging schema
  4. Load customers and transactions
  5. Run dbt to transform staging data into the fact model

No local Python or dbt installation required — everything happens inside Docker.

Exploring Data in the Warehouse

Open Adminer:

http://localhost:8080

Log in:

System: PostgreSQL
Server: postgres
User: databridge
Password: databridge
Database: databridge

Tables to show:

1. staging.transactions Validated transactions data

2. staging.customers Validated customer profiles

3. public.fct_customer_transactions Final joined fact table created by dbt

This is perfect content for your demo video.

What This Project Demonstrates

  • Schema-first pipeline design using JSON Schema
  • Data quality enforcement before ingestion
  • Automated loading into a warehouse
  • Proper use of staging + fact modeling patterns (dbt)
  • Reproducible environment using Docker
  • Industry-standard pipeline organization
  • Modern data engineering stack similar to what real teams use

This is a complete, end-to-end demonstration of data validation → ingestion → warehouse modeling.

Summary

DataBridge is a containerized, schema-driven data pipeline that validates, ingests, and models data with a single command. It reflects real industry patterns and is built to be clear, reproducible, and easy to extend.

About

Build a working, minimal data-reliability pipeline that catches breaking schema changes automatically and deploys new data versions safely using CI/CD.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published