This project presents a comprehensive SQL-based business analysis of HNG Ride, a ride-hailing platform, covering the period June 2021 to December 2024. The objective was to analyze operational performance across drivers, riders, revenue, and payment behavior to support data-driven decision-making.
The analysis focuses on identifying trends, inefficiencies, and high-performing segments that management can leverage to improve operational efficiency, customer retention, and revenue growth.
The project was designed to answer key business questions, including:
- Identifying top-performing and most consistent drivers
- Analyzing rider retention from signup through active usage
- Evaluating quarterly and year-over-year (YoY) revenue growth
- Understanding payment behavior and non-cash adoption
- Ranking top-earning drivers by city
- Identifying drivers eligible for performance incentives
The analysis was conducted using four relational datasets representing core ride-hailing operations:
-
drivers_clean Driver information including city, ratings, and performance attributes
-
riders_clean Rider profiles and signup dates
-
rides_clean Ride-level data including distance, fare, timestamps, status, and cities
-
payments_clean Payment transactions including method, amount, and payment date
These datasets were loaded into PostgreSQL and transformed into clean, analysis-ready tables using SQL.
The datasets used in this project originate from a publicly available internship-style case study designed to simulate real-world ride-hailing operations. The data represents synthetic / simulated business data and was used strictly for educational and portfolio purposes.
The raw CSV files are included in the /data directory for transparency and reproducibility. All data cleaning logic, transformations, and analytical queries are fully documented in the /sql directory.
β οΈ Note: While the dataset structure is shared across similar case studies, all SQL logic, analysis steps, insights, and interpretations in this repository reflect my own implementation and understanding.
Data preparation was a critical phase of the analysis and included:
- Removing duplicate records
- Handling missing and incomplete values
- Standardizing city names and date formats
- Filtering invalid records (e.g. negative fares or payment amounts)
- Restricting analysis strictly to June 2021 β December 2024
All data cleaning steps are documented in:
/sql/01_data_cleaning.sql
The analysis was performed entirely in PostgreSQL, making extensive use of:
- Multi-table joins
- Aggregations and window functions
- Date-based filtering and grouping
- Ranking and performance metrics
Business-focused queries answering the core analysis questions are documented in:
/sql/02_business_analysis.sql
(Each query corresponds directly to a defined business question.)
Some of the insights derived include:
- Identification of the top 10 longest rides and their characteristics
- Quarterly and YoY revenue growth trends across the business
- Driver consistency measured through average monthly activity
- Rider retention patterns from 2021 through 2024
- Identification of high-usage, non-cash-paying riders
- Ranking of top-earning drivers by city
- Selection of drivers meeting criteria for performance bonuses
A detailed business analysis report containing:
- SQL outputs
- Result tables
- Business interpretations
- Actionable recommendations
is available here:
/reports/HNG_Ride_SQL_Business_Analysis_Report.pdf
- SQL (PostgreSQL)
- Relational data modeling
- Business analytics & reporting
- The analysis is based on simulated operational data
- Findings reflect historical trends and do not predict future performance
- Certain business rules were inferred based on available attributes
Detailed assumptions and limitations are documented in:
/notes/assumptions_and_limitations.md
Phemelo Sebopelo Aspiring Data Analyst | SQL | Power BI | Business Analytics