The project demonstrates the design and implementation of a modern SQL Data Warehouse using the Medallion Architecture approach. The solution integrates data from CRM and ERP source systems, applies data quality controls, performs data transformations, and delivers business-ready datasets for analytics and reporting.
The project showcases core data engineering concepts including data ingestion, ETL development, data validation, dimensional modeling, and analytical data preparation.
The objectives of this project are to:
- Build a multi-layered data warehouse architecture.
- Consolidate data from multiple source systems.
- Apply data cleansing and validation processes.
- Implement business rules and data standardization.
- Design dimensional models for analytical reporting.
- Demonstrate end-to-end SQL-based data engineering workflows.
The warehouse follows the Medallion Architecture framework consisting of three layers:
The Bronze Layer stores raw data ingested directly from source systems.
Characteristics
- Raw source records
- Minimal transformations
- Historical data preservation
- Initial landing zone for ingestion
The Silver Layer contains cleansed, standardized, and validated data.
Processing Activities
- Data cleansing
- Duplicate removal
- Data type conversions
- Business rule enforcement
- Data standardization
- Data quality validation
The Gold Layer contains business-ready analytical models.
Components
- Customer Dimension
- Product Dimension
- Sales Fact Table
The layer is designed using a Star Schema to support efficient reporting and dashboard development.
The project combines information from multiple operational systems.
| Source System | Dataset | Description |
|---|---|---|
| CRM | Customer Information | Customer master data |
| CRM | Product Information | Product master data |
| CRM | Sales Details | Transactional sales records |
| ERP | Customer Demographics | Birthdate and gender information |
| ERP | Customer Locations | Geographic information |
| ERP | Product Categories | Product categorization and maintenance information |
The ETL pipeline follows a structured workflow:
Data is imported from CRM and ERP source files into the Bronze Layer.
The Silver Layer applies:
- Data cleansing
- Data validation
- Business rules
- Standardization
- Data enrichment
The Gold Layer builds dimensional models optimized for analytical workloads.
| Table | Description |
|---|---|
| fact_sales | Stores transactional sales metrics and measures |
| Table | Description |
|---|---|
| dim_customers | Customer-related attributes |
| dim_products | Product-related attributes |
Data quality checks are implemented throughout the transformation process.
| Validation Area | Purpose |
|---|---|
| Primary Key Validation | Detect duplicate identifiers |
| Null Validation | Ensure required values are populated |
| Data Standardization | Verify consistent attribute values |
| Date Validation | Validate date ranges and chronology |
| Business Rule Validation | Confirm expected calculations |
| Referential Integrity | Verify dimension and fact relationships |
sql-data-warehouse-project/
β
βββ data/ # Source datasets used throughout the warehouse
β βββ crm/ # CRM source files
β β βββ cust_info.csv
β β βββ prd_info.csv
β β βββ sales_details.csv
β β
β βββ erp/ # ERP source files
β βββ CUST_AZ12.csv
β βββ LOC_A101.csv
β βββ PX_CAT_G1V2.csv
β
βββ docs/ # Documentation, diagrams, and schema references
β βββ bronze/
β β βββ data_flow_bronze.drawio.png # Bronze layer data flow diagram
β β βββ schema_bronze.png # Bronze layer schema visualization
β β βββ README.md # Bronze layer documentation
β β
β βββ silver/
β β βββ data_flow_silver.drawio.png # Silver layer transformation flow
β β βββ silver_schema.png # Silver layer schema visualization
β β βββ README.md # Silver layer documentation
β β
β βββ gold/
β βββ data_flow_gold.drawio.png # Gold layer processing flow
β βββ data_model.png # Star schema / dimensional model
β βββ README.md # Gold layer documentation
β
βββ warehouse/ # Main diagrams and info used in project
β βββ Data_Architecture.drawio.png # Overall warehouse architecture
β βββ data_flow.drawio.png # End-to-end ETL workflow
β βββ data_layers.png # Information per layers in the data
β βββ etl_steps.drawio.png # Guide flow of data from bronze to gold
|
βββ scripts/ # SQL scripts used for warehouse development
β βββ bronze/
β β βββ ddl_bronze.sql # Bronze layer table definitions
β β βββ load_bronze_data.sql # Raw data loading procedures
β β
β βββ silver/
β β βββ ddl_silver.sql # Silver layer table definitions
β β βββ load_silver_data.sql # Data cleansing and transformation logic
β β
β βββ gold/
β βββ ddl_gold.sql # Gold layer dimensional model creation
β
βββ tests/ # Data quality and validation scripts
β βββ silver_quality_check.sql # Silver layer data validation framework
β βββ gold_quality_check.sql # Gold layer integrity validation
β βββ README.md # Testing documentation
β
βββ README.md # Project overview and documentation
βββ LICENSE # License information
βββ .gitignore # Git ignored files and folders
- Data Warehouse Design
- ETL Development
- Data Validation
- Data Quality Management
- Medallion Architecture
- Complex Joins
- Common Table Expressions (CTEs)
- Window Functions
- Data Transformation
Through this project I gained hands-on experience with:
- Designing layered warehouse architectures
- Implementing ETL pipelines using SQL
- Building dimensional models for analytics
- Applying data quality frameworks
- Creating business-ready reporting structures
- Managing data transformations across multiple source systems
Potential enhancements include:
- Incremental data loading
- Power BI dashboards
- Automated scheduling using Airflow
- CI/CD deployment workflows
- Data lineage monitoring
- AI and Machine learning
I would like to express my sincere appreciation to πBaraa Khatib Salkini (Data With Baraa)π for providing exceptional educational content and practical guidance on data warehousing and data engineering concepts.
His project served as a strong learning foundation and helped me better understand ETL development, dimensional modeling, data quality validation, and warehouse architecture design.
The knowledge gained from his tutorials and project walkthroughs was instrumental in helping me successfully build and complete this end-to-end SQL Data Warehouse project.


