This project designs and implements an analytic data warehouse for an online shopping marketplace. The system focuses on tracking vendor performance, product engagement, and financial metrics to support data-driven decision-making. The database is normalized to 3NF, with SQL queries and stored procedures enabling efficient data analysis.
The analytic data warehouse is designed to maintain efficient business metric tracking and analysis capabilities for an online shopping marketplace. Key objectives include:
- Evaluating vendor performance.
- Tracking product customer engagement through reviews.
- Analyzing expenses related to delivery issues and returns.
- Supporting accurate decision-making through a normalized database structure.
The system links entities such as Vendors, Products, Shoppers, Orders, and Revenue, utilizing SQL queries and stored procedures for data insertion and report generation.
- Vendor Performance Analysis: Track total sales and monthly revenue trends to identify high-performing vendors and areas for improvement.
- Product Engagement: Analyze customer reviews and time spent on product pages to improve product offerings.
- Financial Tracking: Assess costs from failed deliveries and returns to optimize logistics and minimize losses.
The conceptual ER diagram defines relationships between core entities:
- Vendors supply Products (many-to-one relationship).
- Shoppers place Orders, which contain OrderDetails (one-to-many relationship).
- Products have ShopperExperience reviews (one-to-many relationship).
- Costs and Revenue tables track financial metrics linked to orders and vendors.
The physical ER diagram maps entities to tables with constraints (primary keys, foreign keys, NOT NULL, CHECK). Key tables include:
Vendors: Stores vendor details.Products: Links to vendors and includes product information.Shoppers: Tracks shopper data and membership status.OrdersandOrderDetails: Record order transactions.ShopperExperience: Captures product reviews and time spent.CostsandRevenue: Track financial metrics.
- Normalized to 3NF to minimize redundancy.
- Foreign keys ensure referential integrity.
- Constraints enforce data validity (e.g.,
Quantity > 0). - Triggers maintain data integrity (e.g., valid reviews).
Randomized data was inserted into tables for testing:
- Vendors: 50 entries with random names and commission rates.
- Products: 1,000 entries linked to random vendors.
- Shoppers: 1,000 entries with random membership status.
- Orders: 10,000 entries with random statuses and dates.
- OrderDetails: 20,000 entries with random quantities.
- ShopperExperience: 1,000 entries with random reviews and time spent.
- Costs: 500 entries with random delivery and return costs.
- Total Sales by Vendor: Aggregates sales data by vendor and month.
- Top Products by Shopper Feedback: Ranks products by review count and average time spent.
- Costs from Failed Deliveries and Returns: Calculates total costs for incomplete orders.
monthly_report: Generates revenue and cost reports for a specified month.
- Atomicity: Ensures all operations in a transaction complete successfully or none at all (e.g., order placement).
- Consistency: Guarantees valid database state transitions (e.g., stock constraints).
- Isolation: Prevents interference between concurrent transactions (e.g., simultaneous orders).
- Durability: Ensures committed transactions persist despite system failures.
- Consistency: All nodes reflect the latest data (e.g., stock updates).
- Availability: System responds to requests even during failures.
- Partition Tolerance: System operates despite network partitions.
Amazon's data warehouse integrates multiple data sources for personalized recommendations and secure access management.
The analytic data warehouse successfully tracks business metrics, vendor performance, and product engagement.