DataBridge

DataBridge is a fully containerized data engineering pipeline that validates raw CSV files using JSON Schema, loads them into a PostgreSQL warehouse, and builds analytics-ready models using dbt. The entire workflow runs end-to-end with one command via Docker Compose.

This project demonstrates a modern, production-grade pattern for data validation → ingestion → warehouse modeling.

Demo Video

(Coming soon...)

Features Implemented

JSON Schema–based validation for strong data contracts
Automated ingestion pipeline written in Python
Fully containerized execution using Docker
PostgreSQL warehouse automatically initialized each run
dbt staging models:
- stg_transactions
- stg_customers
dbt fact model joining customers + transactions:
- fct_customer_transactions
Column-level tests (unique, not-null, accepted_values)
One-command workflow:
```
docker compose up --build
```
Adminer UI for exploring warehouse tables in the browser

Tech Stack Used

Languages & Tools

Python
SQL

Database

PostgreSQL

Validation

JSON Schema

Modeling

dbt (data transformations)

Containerization

Docker + Docker Compose

Running the Pipeline

From the root of the project:

docker compose up --build

This will:

Start PostgreSQL and Adminer
Validate both datasets against JSON Schema
Create the staging schema
Load customers and transactions
Run dbt to transform staging data into the fact model

No local Python or dbt installation required — everything happens inside Docker.

Exploring Data in the Warehouse

Open Adminer:

http://localhost:8080

Log in:

System: PostgreSQL
Server: postgres
User: databridge
Password: databridge
Database: databridge

Tables to show:

1. staging.transactions Validated transactions data

2. staging.customers Validated customer profiles

3. public.fct_customer_transactions Final joined fact table created by dbt

This is perfect content for your demo video.

What This Project Demonstrates

Schema-first pipeline design using JSON Schema
Data quality enforcement before ingestion
Automated loading into a warehouse
Proper use of staging + fact modeling patterns (dbt)
Reproducible environment using Docker
Industry-standard pipeline organization
Modern data engineering stack similar to what real teams use

This is a complete, end-to-end demonstration of data validation → ingestion → warehouse modeling.

Summary

DataBridge is a containerized, schema-driven data pipeline that validates, ingests, and models data with a single command. It reflects real industry patterns and is built to be clear, reproducible, and easy to extend.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
contracts		contracts
data/raw		data/raw
databridge_dbt		databridge_dbt
pipeline		pipeline
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataBridge

Demo Video

Features Implemented

Tech Stack Used

Running the Pipeline

Exploring Data in the Warehouse

Tables to show:

What This Project Demonstrates

Summary

About

Uh oh!

Releases

Packages

Languages

nivasharmaa/DataBridge

Folders and files

Latest commit

History

Repository files navigation

DataBridge

Demo Video

Features Implemented

Tech Stack Used

Running the Pipeline

Exploring Data in the Warehouse

Tables to show:

What This Project Demonstrates

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages