DataBridge is a fully containerized data engineering pipeline that validates raw CSV files using JSON Schema, loads them into a PostgreSQL warehouse, and builds analytics-ready models using dbt. The entire workflow runs end-to-end with one command via Docker Compose.
This project demonstrates a modern, production-grade pattern for data validation → ingestion → warehouse modeling.
(Coming soon...)
-
JSON Schema–based validation for strong data contracts
-
Automated ingestion pipeline written in Python
-
Fully containerized execution using Docker
-
PostgreSQL warehouse automatically initialized each run
-
dbt staging models:
stg_transactionsstg_customers
-
dbt fact model joining customers + transactions:
fct_customer_transactions
-
Column-level tests (unique, not-null, accepted_values)
-
One-command workflow:
docker compose up --build -
Adminer UI for exploring warehouse tables in the browser
Languages & Tools
- Python
- SQL
Database
- PostgreSQL
Validation
- JSON Schema
Modeling
- dbt (data transformations)
Containerization
- Docker + Docker Compose
From the root of the project:
docker compose up --buildThis will:
- Start PostgreSQL and Adminer
- Validate both datasets against JSON Schema
- Create the
stagingschema - Load customers and transactions
- Run dbt to transform staging data into the fact model
No local Python or dbt installation required — everything happens inside Docker.
Open Adminer:
http://localhost:8080
Log in:
System: PostgreSQL
Server: postgres
User: databridge
Password: databridge
Database: databridge
1. staging.transactions Validated transactions data
2. staging.customers Validated customer profiles
3. public.fct_customer_transactions Final joined fact table created by dbt
This is perfect content for your demo video.
- Schema-first pipeline design using JSON Schema
- Data quality enforcement before ingestion
- Automated loading into a warehouse
- Proper use of staging + fact modeling patterns (dbt)
- Reproducible environment using Docker
- Industry-standard pipeline organization
- Modern data engineering stack similar to what real teams use
This is a complete, end-to-end demonstration of data validation → ingestion → warehouse modeling.
DataBridge is a containerized, schema-driven data pipeline that validates, ingests, and models data with a single command. It reflects real industry patterns and is built to be clear, reproducible, and easy to extend.