-
Notifications
You must be signed in to change notification settings - Fork 0
Process Data
Yomi Ogunyinka edited this page Jul 6, 2021
·
4 revisions
Mapping Document - Master Data Tables to Dimensions
- Type 1 Dimension
- Apply transformation rules based upon this mapping.
- Approach 1 - Get join results by running a join query between 3 tables.
- Approach 2 - Get data from all the master data tables, apply processing (join) as part of the pipeline
| Target Table | Target Column | Source Table | Source Column | Description |
|---|---|---|---|---|
| dim_customers | customers | Get all columns from Source | ||
| dim_customers | batch_id | |||
| dim_customers | batch_date | Date on which data is loaded | ||
| dim_products | product_id | products | product_id | |
| dim_products | product_name | products | product_name | |
| dim_products | product_price | products | product_price | |
| dim_products | category_id | categories | category_id | |
| dim_products | category_name | categories | category_name | |
| dim_products | department_id | departments | department_id | |
| dim_products | department_name | |||
| dim_products | batch_id | |||
| dim_products | batch_name |
Mapping Document - Transaction Tables to Facts
- Get all the required fields from orders and order_items by passing a data as an argument
- Filtering and Aggregations should be done using Pandas as part of the pipeline
- Data should be filtered and pre-aggregated as per the transformation logic and then should be stored into the fact
| Target Table | Target Column | Source Table | Source Column | Description |
|---|---|---|---|---|
| fact_product_revenue_dly | date_id (int) | orders | order_date (date) | Just store the date part of ignoring time stamp. (e.g: 20140101) |
| fact_product_revenue_dly | product_id | order_items | order_item_product _id | Get the product ids as per this mapping. |
| fact_product_revenue_dly | product_revenue | order_items | order_item_subtotal |
orders (orders.orders_status) sum(order_item_subtotal) group by orders.order_date and order_items.order_item_product_id |
| fact_product_revenue_dly | outstanding_revenue | order_items | order_item_subtotal |
orders (orders.orders_status) sum(order_item_subtotal) group by orders.order_date and order_items.order_item_product_id |
| fact_product_revenue_dly | total_order_cnt | orders | order_items.order_item_product_id |
|
| fact_product_revenue_dly | ||||
| fact_product_revenue_dly | ||||
| fact_product_revenue_dly | ||||
| fact_product_revenue_dly | ||||
| fact_product_revenue_dly |