Skip to content

Aldosee/SQL-Warehouse-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

56 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SQL Warehouse Project

Overview

The project demonstrates the design and implementation of a modern SQL Data Warehouse using the Medallion Architecture approach. The solution integrates data from CRM and ERP source systems, applies data quality controls, performs data transformations, and delivers business-ready datasets for analytics and reporting.

The project showcases core data engineering concepts including data ingestion, ETL development, data validation, dimensional modeling, and analytical data preparation.


πŸ“Œ Project Goals

The objectives of this project are to:

  • Build a multi-layered data warehouse architecture.
  • Consolidate data from multiple source systems.
  • Apply data cleansing and validation processes.
  • Implement business rules and data standardization.
  • Design dimensional models for analytical reporting.
  • Demonstrate end-to-end SQL-based data engineering workflows.

πŸ—ΊοΈ Architecture Overview

The warehouse follows the Medallion Architecture framework consisting of three layers:

Data Architecture

🟠 Bronze Layer

The Bronze Layer stores raw data ingested directly from source systems.

Characteristics

  • Raw source records
  • Minimal transformations
  • Historical data preservation
  • Initial landing zone for ingestion

βšͺ Silver Layer

The Silver Layer contains cleansed, standardized, and validated data.

Processing Activities

  • Data cleansing
  • Duplicate removal
  • Data type conversions
  • Business rule enforcement
  • Data standardization
  • Data quality validation

🟑 Gold Layer

The Gold Layer contains business-ready analytical models.

Components

  • Customer Dimension
  • Product Dimension
  • Sales Fact Table

The layer is designed using a Star Schema to support efficient reporting and dashboard development.


♻️ Data Flow

Data Flow


πŸ’Ύ Data Sources

The project combines information from multiple operational systems.

Source System Dataset Description
CRM Customer Information Customer master data
CRM Product Information Product master data
CRM Sales Details Transactional sales records
ERP Customer Demographics Birthdate and gender information
ERP Customer Locations Geographic information
ERP Product Categories Product categorization and maintenance information

πŸ” ETL Process

The ETL pipeline follows a structured workflow:

1. Extract

Data is imported from CRM and ERP source files into the Bronze Layer.

2. Transform

The Silver Layer applies:

  • Data cleansing
  • Data validation
  • Business rules
  • Standardization
  • Data enrichment

3. Load

The Gold Layer builds dimensional models optimized for analytical workloads.


πŸ’  Data Model

Star Schema

Star Schema

Fact Table

Table Description
fact_sales Stores transactional sales metrics and measures

Dimension Tables

Table Description
dim_customers Customer-related attributes
dim_products Product-related attributes

πŸ—‚οΈ Data Quality Framework

Data quality checks are implemented throughout the transformation process.

Validation Categories

Validation Area Purpose
Primary Key Validation Detect duplicate identifiers
Null Validation Ensure required values are populated
Data Standardization Verify consistent attribute values
Date Validation Validate date ranges and chronology
Business Rule Validation Confirm expected calculations
Referential Integrity Verify dimension and fact relationships

🧱 Project Structure

sql-data-warehouse-project/
β”‚
β”œβ”€β”€ data/                                  # Source datasets used throughout the warehouse
β”‚   β”œβ”€β”€ crm/                               # CRM source files
β”‚   β”‚   β”œβ”€β”€ cust_info.csv
β”‚   β”‚   β”œβ”€β”€ prd_info.csv
β”‚   β”‚   └── sales_details.csv
β”‚   β”‚
β”‚   └── erp/                               # ERP source files
β”‚       β”œβ”€β”€ CUST_AZ12.csv
β”‚       β”œβ”€β”€ LOC_A101.csv
β”‚       └── PX_CAT_G1V2.csv
β”‚
β”œβ”€β”€ docs/                                  # Documentation, diagrams, and schema references
β”‚   β”œβ”€β”€ bronze/
β”‚   β”‚   β”œβ”€β”€ data_flow_bronze.drawio.png    # Bronze layer data flow diagram
β”‚   β”‚   β”œβ”€β”€ schema_bronze.png              # Bronze layer schema visualization
β”‚   β”‚   └── README.md                      # Bronze layer documentation
β”‚   β”‚
β”‚   β”œβ”€β”€ silver/
β”‚   β”‚   β”œβ”€β”€ data_flow_silver.drawio.png    # Silver layer transformation flow
β”‚   β”‚   β”œβ”€β”€ silver_schema.png              # Silver layer schema visualization
β”‚   β”‚   └── README.md                      # Silver layer documentation
β”‚   β”‚
β”‚   └── gold/
β”‚       β”œβ”€β”€ data_flow_gold.drawio.png      # Gold layer processing flow
β”‚       β”œβ”€β”€ data_model.png                 # Star schema / dimensional model
β”‚       └── README.md                      # Gold layer documentation
β”‚
β”œβ”€β”€ warehouse/                             # Main diagrams and info used in project 
β”‚   β”œβ”€β”€ Data_Architecture.drawio.png       # Overall warehouse architecture
β”‚   └── data_flow.drawio.png               # End-to-end ETL workflow
β”‚   └── data_layers.png                    # Information per layers in the data
β”‚   └── etl_steps.drawio.png               # Guide flow of data from bronze to gold
|
β”œβ”€β”€ scripts/                               # SQL scripts used for warehouse development
β”‚   β”œβ”€β”€ bronze/
β”‚   β”‚   β”œβ”€β”€ ddl_bronze.sql                 # Bronze layer table definitions
β”‚   β”‚   └── load_bronze_data.sql           # Raw data loading procedures
β”‚   β”‚
β”‚   β”œβ”€β”€ silver/
β”‚   β”‚   β”œβ”€β”€ ddl_silver.sql                 # Silver layer table definitions
β”‚   β”‚   └── load_silver_data.sql           # Data cleansing and transformation logic
β”‚   β”‚
β”‚   └── gold/
β”‚       └── ddl_gold.sql                   # Gold layer dimensional model creation
β”‚
β”œβ”€β”€ tests/                                 # Data quality and validation scripts
β”‚   β”œβ”€β”€ silver_quality_check.sql           # Silver layer data validation framework
β”‚   β”œβ”€β”€ gold_quality_check.sql             # Gold layer integrity validation
β”‚   └── README.md                          # Testing documentation
β”‚
β”œβ”€β”€ README.md                              # Project overview and documentation
β”œβ”€β”€ LICENSE                                # License information
└── .gitignore                             # Git ignored files and folders

πŸ‘ Skills Demonstrated

Data Engineering

  • Data Warehouse Design
  • ETL Development
  • Data Validation
  • Data Quality Management
  • Medallion Architecture

SQL Development

  • Complex Joins
  • Common Table Expressions (CTEs)
  • Window Functions
  • Data Transformation

πŸ“š Learnings

Through this project I gained hands-on experience with:

  • Designing layered warehouse architectures
  • Implementing ETL pipelines using SQL
  • Building dimensional models for analytics
  • Applying data quality frameworks
  • Creating business-ready reporting structures
  • Managing data transformations across multiple source systems

βš™οΈ Future Improvements

Potential enhancements include:

  • Incremental data loading
  • Power BI dashboards
  • Automated scheduling using Airflow
  • CI/CD deployment workflows
  • Data lineage monitoring
  • AI and Machine learning

πŸ™‡πŸ» Acknowledgements

I would like to express my sincere appreciation to πŸ‘‘Baraa Khatib Salkini (Data With Baraa)πŸ‘‘ for providing exceptional educational content and practical guidance on data warehousing and data engineering concepts.

His project served as a strong learning foundation and helped me better understand ETL development, dimensional modeling, data quality validation, and warehouse architecture design.

The knowledge gained from his tutorials and project walkthroughs was instrumental in helping me successfully build and complete this end-to-end SQL Data Warehouse project.

About

Building a data warehouse in SQL server. This project covers ETL workflow, data model, and analytics. Also, the project showcase my learning in making end-to-end data warehouse and highlight practices in data engineering.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages