Skip to content

orchaid/SQL_Data_Warehouse_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data WareHouse & Analytics Project

🔹 Overview

"This project is a data warehouse designed to integrate data from multiple sources (CRM, ERP) into a structured format for analysis. The architecture follows the Bronze-Silver-Gold layered approach, ensuring data quality and accessibility.

🔹 Architecture

1️⃣ Data Sources

  • CRM System (Customer-related data)
  • ERP System (Enterprise resource data)

2️⃣ ETL Process

  • Extraction: Data is ingested from multiple sources.
  • Transformation: Data is cleaned, deduplicated, and enriched.
  • Loading: Processed data is stored in analytical tables for reporting.

3️⃣ Data Layers

This project follows the Bronze-Silver-Gold architecture

Bronze-Silver-Gold aproach

Layer Purpose
Bronze Raw data ingestion from CSV Files into SQL Server Database.
Silver Cleaned, Normalized, structured data
Gold Optimized fact & dimension tables for reporting

4️⃣ Technologies Used

  • Database: PostgreSQL

  • ETL Pipeline: SQL-based transformations and stored procedures

  • Tools Used: SQL, draw.io.

🔹 Features

✅ Automated ETL pipeline for seamless data processing
✅ Data quality checks & validation
✅ Optimized for analytics & dashboard reporting
✅ Scalable architecture for handling large datasets

🔹 Setup Instructions

Clone the repository:

git clone https://github.com/SQL_Data_Warehouse_Project.git
cd SQL_Data_Warehouse_Project

Run the SQL scripts in your database:

-- Example command to execute SQL script
psql -U your_user -d your_database -f etl_pipeline.sql

🔹 Data Warehouse Schema

Data Flow Diagram

🔹 Future Improvements

  • Improve query performance with partitioning & indexing
  • Add real-time streaming data ingestion
  • Enhance data governance with role-based access control (RBAC)

Repository Structure

data-warehouse-project/
│
├── datasets/                           # Raw datasets used for the project (ERP and CRM data)
│
├── docs/                               # Project documentation and architecture details
│   ├── architecture.drawio        # Draw.io file shows the project's architecture
│   ├── data_catalog.md                 # Catalog of datasets, including field descriptions and metadata
│   ├── data_flow.drawio                # Draw.io file for the data flow diagram
│   ├── data_mart.drawio              # Draw.io file for data models (star schema)
│   ├── naming-conventions.md           # Consistent naming guidelines for tables, columns, and files
│
├── scripts/                            # SQL scripts for ETL and transformations
│   ├── bronze/                         # Scripts for extracting and loading raw data
│   ├── silver/                         # Scripts for cleaning and transforming data
│   ├── gold/                           # Scripts for creating analytical models
│
├── tests/                              # Test scripts and quality files
│
├── README.md                           # Project overview and instructions
├── .gitignore                          # Files and directories to be ignored by Git
└── requirements.txt                    # Dependencies and requirements for the project

Analytics on the Gold Layer

The Analtyics Repository serves as a continuation of this project.

About

Creating a data warehouse in the bronze-silver-gold approach from two sources CRM and ERP

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published