Data Engineering Warehouse Project

This portfolio project showcases a modern approach to data warehousing and analytics using SQL Server. It covers key components such as ETL processes, data modeling, and analytics, while reflecting industry-standard practices in modern data engineering.

Project Overview

This project involves:

Data Architecture: Designing a Modern Data Warehouse Using the Medallion Architecture: Bronze, Silver, and Gold Layers.
ETL Pipelines: Extracting, Transforming, and Loading data from source systems into the warehouse.
Data Modeling (Kimball): Developing fact and dimension tables optimized for analytical queries.

Data Architecture

The data architecture for this project is the Medallion (Multi-Hop) Architecture. This architecture consists of three distinct layers that perform ETL on the data.

Bronze Layer

In the Bronze Layer the data is Extracted from the source system (the CRM and ERP). The data is not Transformed in the Bronze Layer and is Loaded to the 'bronze' schema using a Full Load.

Silver Layer

The Silver Layer Transforms the data from the Bronze Layer to clean data, and then it is Loaded into the 'silver' schema using a Full Load. The following Transformations are applied:

Data Cleaning: string trimming and replacing NULL values
Data Standardization: making values consistent
Data Normalization: replacing special characters
Derived Columns: calculating the cost
Data Enrichment: adding additional data from other tables (e.g., customer's birth dates and countries)

Gold Layer

The Gold Layer integrates and aggregates data to create business-ready information. It is modeled using a star schema to optimize analytics and facilitate informed decision-making.

Links & Tools:

Datasets: The data used for this project (csv files).
Data with Baraa: Helpful information for learning about SQL, ETL and data warehouses.
SQL Server: Lightweight server for hosting the SQL database.
mssql VSCode extension: VSCode extension for managing and interacting with the database.
Git: For managing the code and project details.
DrawIO: Tool to design data architecture, models, flows, and diagrams.
Notion: All-in-one tool for project management and planning.

Repository Structure

data-engineering-warehouse-project/
│
├── datasets/                           # Raw datasets used for the project (ERP and CRM data)
│
├── docs/                               # Project documentation and architecture details
│   ├── data_architecture.drawio.png    # Shows the project's architecture
│   ├── data_catalog.md                 # Catalog of datasets, including field descriptions and metadata
│   ├── data_flow.drawio.png            # Shows the data flow diagram
│   ├── data_models.drawio.png          # Shows the data model (star schema)
│   └── naming-conventions.md           # Consistent naming guidelines for tables, columns, and files
│
├── scripts/                            # SQL scripts for ETL and transformations
│   ├── bronze/                         # Scripts for extracting and loading raw data
│   ├── data-analysis/                  # Scripts for data analysis and reporting 
│   ├── gold/                           # Scripts for creating analytical models
│   └── silver/                         # Scripts for cleaning and transforming data
│
├── tests/                              # Scripts for validating the quality of the scripts
│
├── README.md                           # Project overview and information
└── LICENSE                             # License information for the repository

License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

About Me

Hello, I am Mike Ebbe, an IT/AI graduate aspiring to specialize in data engineering! This project is my way of further delving into world of data by getting acquainted with data architectures and ETL. I hope to learn a lot more!

Tools and Methodologies I Want to Learn in Future Projects:

Cloud Data Platforms:
- Databricks
- Snowflake
Big Data Processing Frameworks
- Apache Spark
Event Streaming Platforms:
- Kafka
Data Warehousing Methodologies:
- Inmon
Databases
- PostgreSQL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Engineering Warehouse Project

Project Overview

Data Architecture

Bronze Layer

Silver Layer

Gold Layer

Links & Tools:

Repository Structure

License

About Me

Tools and Methodologies I Want to Learn in Future Projects:

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
datasets		datasets
docs		docs
scripts		scripts
tests		tests
LICENSE		LICENSE
README.md		README.md

License

MikeEbbe/data-engineering-warehouse-project

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Warehouse Project

Project Overview

Data Architecture

Bronze Layer

Silver Layer

Gold Layer

Links & Tools:

Repository Structure

License

About Me

Tools and Methodologies I Want to Learn in Future Projects:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages