Skip to content

SQL-based business analysis of a simulated ride-hailing platform using PostgreSQL. The project analyzes revenue trends, driver performance, rider behavior, and operational efficiency through data cleaning, joins, aggregations, and window functions in an internship-style case study.

Notifications You must be signed in to change notification settings

Phemelocodemode/HNG-Ride-SQL-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HNG Ride – SQL Business & Revenue Analysis

πŸ“Œ Project Overview

This project presents a comprehensive SQL-based business analysis of HNG Ride, a ride-hailing platform, covering the period June 2021 to December 2024. The objective was to analyze operational performance across drivers, riders, revenue, and payment behavior to support data-driven decision-making.

The analysis focuses on identifying trends, inefficiencies, and high-performing segments that management can leverage to improve operational efficiency, customer retention, and revenue growth.


🎯 Business Objectives

The project was designed to answer key business questions, including:

  • Identifying top-performing and most consistent drivers
  • Analyzing rider retention from signup through active usage
  • Evaluating quarterly and year-over-year (YoY) revenue growth
  • Understanding payment behavior and non-cash adoption
  • Ranking top-earning drivers by city
  • Identifying drivers eligible for performance incentives

πŸ—‚οΈ Data Description

The analysis was conducted using four relational datasets representing core ride-hailing operations:

  • drivers_clean Driver information including city, ratings, and performance attributes

  • riders_clean Rider profiles and signup dates

  • rides_clean Ride-level data including distance, fare, timestamps, status, and cities

  • payments_clean Payment transactions including method, amount, and payment date

These datasets were loaded into PostgreSQL and transformed into clean, analysis-ready tables using SQL.


πŸ—‚οΈ Data Source & Usage

The datasets used in this project originate from a publicly available internship-style case study designed to simulate real-world ride-hailing operations. The data represents synthetic / simulated business data and was used strictly for educational and portfolio purposes.

The raw CSV files are included in the /data directory for transparency and reproducibility. All data cleaning logic, transformations, and analytical queries are fully documented in the /sql directory.

⚠️ Note: While the dataset structure is shared across similar case studies, all SQL logic, analysis steps, insights, and interpretations in this repository reflect my own implementation and understanding.


🧹 Data Cleaning & Preparation

Data preparation was a critical phase of the analysis and included:

  • Removing duplicate records
  • Handling missing and incomplete values
  • Standardizing city names and date formats
  • Filtering invalid records (e.g. negative fares or payment amounts)
  • Restricting analysis strictly to June 2021 – December 2024

All data cleaning steps are documented in:

/sql/01_data_cleaning.sql

πŸ“Š Analysis Approach

The analysis was performed entirely in PostgreSQL, making extensive use of:

  • Multi-table joins
  • Aggregations and window functions
  • Date-based filtering and grouping
  • Ranking and performance metrics

Business-focused queries answering the core analysis questions are documented in:

/sql/02_business_analysis.sql

(Each query corresponds directly to a defined business question.)


πŸ“ˆ Key Insights

Some of the insights derived include:

  • Identification of the top 10 longest rides and their characteristics
  • Quarterly and YoY revenue growth trends across the business
  • Driver consistency measured through average monthly activity
  • Rider retention patterns from 2021 through 2024
  • Identification of high-usage, non-cash-paying riders
  • Ranking of top-earning drivers by city
  • Selection of drivers meeting criteria for performance bonuses

πŸ“„ Business Report

A detailed business analysis report containing:

  • SQL outputs
  • Result tables
  • Business interpretations
  • Actionable recommendations

is available here:

/reports/HNG_Ride_SQL_Business_Analysis_Report.pdf

πŸ› οΈ Tools & Technologies

  • SQL (PostgreSQL)
  • Relational data modeling
  • Business analytics & reporting

⚠️ Assumptions & Limitations

  • The analysis is based on simulated operational data
  • Findings reflect historical trends and do not predict future performance
  • Certain business rules were inferred based on available attributes

Detailed assumptions and limitations are documented in:

/notes/assumptions_and_limitations.md

πŸ‘€ Author

Phemelo Sebopelo Aspiring Data Analyst | SQL | Power BI | Business Analytics

About

SQL-based business analysis of a simulated ride-hailing platform using PostgreSQL. The project analyzes revenue trends, driver performance, rider behavior, and operational efficiency through data cleaning, joins, aggregations, and window functions in an internship-style case study.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published