Skip to content

Process Data

Yomi Ogunyinka edited this page Jul 6, 2021 · 4 revisions

Mapping Document - Master Data Tables to Dimensions

  • Type 1 Dimension
  • Apply transformation rules based upon this mapping.
  • Approach 1 - Get join results by running a join query between 3 tables.
  • Approach 2 - Get data from all the master data tables, apply processing (join) as part of the pipeline
Target Table Target Column Source Table Source Column Description
dim_customers customers Get all columns from Source
dim_customers batch_id
dim_customers batch_date Date on which data is loaded
dim_products product_id products product_id
dim_products product_name products product_name
dim_products product_price products product_price
dim_products category_id categories category_id
dim_products category_name categories category_name
dim_products department_id departments department_id
dim_products department_name
dim_products batch_id
dim_products batch_name

Mapping Document - Transaction Tables to Facts

  • Get all the required fields from orders and order_items by passing a data as an argument
  • Filtering and Aggregations should be done using Pandas as part of the pipeline
  • Data should be filtered and pre-aggregated as per the transformation logic and then should be stored into the fact
Target Table Target Column Source Table Source Column Description
fact_product_revenue_dly date_id (int) orders order_date (date) Just store the date part of ignoring time stamp.
(e.g: 20140101)
fact_product_revenue_dly product_id order_items order_item_product _id Get the product ids as per this mapping.
fact_product_revenue_dly product_revenue order_items order_item_subtotal
  • Filtering - COMPLETE or CLOSED
    orders (orders.orders_status)
  • Aggregation -
    sum(order_item_subtotal) group by orders.order_date and
    order_items.order_item_product_id
  • fact_product_revenue_dly outstanding_revenue order_items order_item_subtotal
  • Filtering - PENDING_PAYMENT or PROCESSING or PENDING
    orders (orders.orders_status)
  • Aggregation -
    sum(order_item_subtotal) group by orders.order_date and
    order_items.order_item_product_id
  • fact_product_revenue_dly total_order_cnt orders
  • Aggregation - count(1) group by orders.order_date and
    order_items.order_item_product_id
  • fact_product_revenue_dly
    fact_product_revenue_dly
    fact_product_revenue_dly
    fact_product_revenue_dly
    fact_product_revenue_dly

    Clone this wiki locally