Skip to content

Automatically identify and localize potholes in road images, providing critical infrastructure monitoring capabilities. The project encompasses the complete machine learning lifecycle from data preprocessing and model training to optimized deployment.

Notifications You must be signed in to change notification settings

NBAmine/Pothole-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Pothole Detection System

Project Overview

The Pothole Detection System is designed to automatically identify and localize potholes in road images, providing critical infrastructure monitoring capabilities. The project encompasses the complete machine learning lifecycle from data preprocessing and model training to optimized deployment.

Key Features

  • YOLOv8-based Detection: Leverages state-of-the-art object detection architecture for accurate pothole localization
  • Stratified K-Fold Cross-Validation: Ensures robust model evaluation and generalization
  • Multi-Engine Inference: Supports PyTorch, ONNX, and OpenVINO for flexible deployment
  • Model Optimization: Implements pruning techniques to reduce model size and inference latency
  • Production-Ready API: FastAPI server with comprehensive bounding box predictions and performance metrics
  • Containerized Deployment: Docker support for seamless deployment across environments

Project Structure

pothole_detection/
├── pipeline/                      # Training and data preparation pipeline
│   ├── notebooks/
│   │   ├── data_cleaning.ipynb   # Data preprocessing and exploration
│   │   └── training.ipynb        # Model training experiments
│   └── scripts/
│       ├── create_folds_colab.py # K-fold split generation with stratification
│       ├── create_folds_kaggle.py
│       ├── train_fold.py          # Single fold training script
│       ├── train_full.py          # Final model training on best fold
│       └── summarize_cv.py        # Cross-validation results aggregation
│
└── server/                        # Inference server and deployment
    ├── app/
    │   ├── main.py               # FastAPI application entry point
    │   ├── engines/              # Inference engine implementations
    │   │   ├── pytorch_engine.py # Original and pruned PyTorch models
    │   │   ├── onnx_engine.py    # ONNX Runtime inference
    │   │   └── openvino_engine.py # Intel OpenVINO inference
    │   ├── models/               # Model artifacts
    │   │   ├── original.pt       # Original YOLOv8 model
    │   │   ├── pruned.pt         # Pruned model
    │   │   ├── model.onnx        # ONNX format
    │   │   └── model_openvino/   # OpenVINO IR format
    │   └── schemas/              # Pydantic response models
    ├── Dockerfile                # Container configuration
    └── requirements.txt          # Python dependencies

Methodology

1. Data Collection and Preprocessing

The project consolidates multiple Kaggle datasets into a unified, standardized format suitable for YOLOv8 training. This preprocessing phase is implemented in data_cleaning.ipynb.

Dataset Sources

Four distinct Kaggle datasets were collected, each with different annotation formats and characteristics:

  1. andrewmvd/pothole-detection

    • XML annotations (Pascal VOC format)
    • High-quality labeled images
    • Consistent image quality
  2. chitholian/annotated-potholes-dataset

    • XML annotations with varying image sizes
    • Diverse road conditions and camera angles
    • Rich annotation detail
  3. rajdalsaniya/pothole-detection-dataset

    • Already in YOLOv8 format (text annotations)
    • Pre-split train/validation sets
    • Good quality but some ambiguous images
  4. atulyakumar98/pothole-detection-dataset

    • Negative samples (roads without potholes)
    • Critical for reducing false positives
    • No annotations required (empty label files)

Data Cleaning Pipeline

The preprocessing pipeline performs the following transformations:

Step 1: Dataset Download and Caching

  • Automated Kaggle dataset download using Kaggle API
  • Intelligent caching to Google Drive to avoid redundant downloads
  • Organized storage structure for multiple dataset versions

Step 2: Annotation Format Standardization

XML to YOLO Conversion:

# For Pascal VOC (XML) annotations:
# Convert bounding boxes from absolute coordinates to YOLO format
cx = (xmin + xmax) / 2 / image_width
cy = (ymin + ymax) / 2 / image_height
w = (xmax - xmin) / image_width
h = (ymax - ymin) / image_height
# Output: "class_id cx cy w h"

The conversion handles:

  • Parsing XML files to extract bounding box coordinates
  • Converting absolute pixel coordinates to normalized relative coordinates
  • Handling multiple objects per image
  • Error handling for malformed annotations

Pre-formatted Dataset Integration:

  • Direct copying of datasets already in YOLO format
  • Validation of image-label pairing
  • Consistency checks for annotation format

Step 3: Image and Label Renaming

  • Unified naming convention: {dataset_source}_{global_index}.{ext}
  • Examples: andrewmvd_001.jpg, chitholian_342.txt, rajdalsaniya_789.jpg
  • Prevents filename collisions across datasets
  • Maintains traceability to source dataset

Step 4: Negative Sample Integration

  • Inclusion of images without potholes (negative samples)
  • Empty annotation files created for negative samples
  • Improves model's ability to distinguish roads without defects
  • Reduces false positive rate in deployment

Step 5: Dataset Consolidation

  • All images copied to unified images/train/ directory
  • All labels saved to corresponding labels/train/ directory
  • Automatic directory structure creation
  • Progress tracking with tqdm for large datasets

Data Quality Assurance

The pipeline implements several quality control measures:

  • File Existence Validation: Verifies both image and annotation files exist before processing
  • Format Validation: Ensures bounding box coordinates are within valid ranges (0-1)
  • Consistency Checks: Confirms image dimensions match annotation references
  • Error Logging: Captures and reports parsing errors without halting the entire pipeline
  • Progress Monitoring: Real-time feedback on processing status for each dataset

Final Dataset Statistics

The consolidated dataset includes:

  • Total Images: ~3,000+ road images from diverse sources
  • Positive Samples: Images with one or more potholes (varying density)
  • Negative Samples: Clean road images for false positive reduction
  • Annotation Format: YOLOv8 normalized coordinates (class_id, cx, cy, w, h)
  • Class Distribution: Single class ("pothole") with varying instance counts per image

Data Cleaning Workflow

┌─────────────────────────────────────────────────────────────────┐
│                    KAGGLE DATASETS                              │
├─────────────────────────────────────────────────────────────────┤
│ andrewmvd (XML)  │ chitholian (XML)  │ rajdalsaniya (YOLO)     │
│ atulyakumar98 (Negative samples)                                │
└────────────┬────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│              PREPROCESSING PIPELINE                             │
├─────────────────────────────────────────────────────────────────┤
│ 1. Download & Cache to Google Drive                             │
│ 2. Parse XML → Extract bounding boxes                           │
│ 3. Convert to YOLO format (normalized coordinates)              │
│ 4. Rename: {dataset}_{id}.{ext}                                 │
│ 5. Copy pre-formatted YOLO data                                 │
│ 6. Generate empty labels for negatives                          │
│ 7. Validate & consolidate                                       │
└────────────┬────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│              UNIFIED DATASET                                    │
├─────────────────────────────────────────────────────────────────┤
│ pothole_dataset/                                                │
│ ├── images/train/   ← All images (standardized names)          │
│ └── labels/train/   ← YOLO annotations (.txt files)            │
└─────────────────────────────────────────────────────────────────┘

2. Data Preparation and Stratification

The training pipeline implements stratified K-fold cross-validation to ensure balanced representation across folds:

  • Stratification Strategy: Images are grouped based on pothole count (0, 1-3, 3-6, 6+ potholes)
  • Fold Generation: 5-fold cross-validation with stratified splitting ensures each fold maintains similar class distributions
  • Benefit: Prevents overfitting to specific data characteristics and provides robust performance estimates

3. Model Training

Cross-Validation Training

  • Each fold is trained independently using YOLOv8n (nano) architecture
  • Training configuration:
    • Epochs: 100 per fold
    • Image Size: 640×640 pixels
    • Batch Size: 16
    • Early Stopping: Patience of 20 epochs
    • Data Augmentation: Enabled for improved generalization

Final Model Training

  • Best performing fold (based on mAP@0.5:0.95) is selected
  • Final model trained for 200 epochs on full train/val split
  • Achieves optimal balance between accuracy and inference speed

4. Model Optimization

The project implements multiple inference engines for different deployment scenarios:

PyTorch Models

  • Original Model: Full YOLOv8n model with complete architecture
  • Pruned Model: Optimized version with reduced parameters while maintaining accuracy

ONNX Runtime

  • Cross-platform inference with hardware acceleration support
  • Optimized computation graph for faster inference
  • Custom non-maximum suppression (NMS) implementation

OpenVINO

  • Intel-optimized inference engine
  • Particularly efficient on CPU architectures
  • Ideal for edge deployment scenarios

5. API Design

The FastAPI server provides a unified interface for all inference engines:

Endpoint: POST /predict/image

Request:

  • Multipart form-data with image file
  • Supports common image formats (JPEG, PNG, BMP)

Response:

{
  "results": [
    {
      "engine": "original",
      "boxes": [
        {
          "x1": 120.5,
          "y1": 85.3,
          "x2": 250.8,
          "y2": 180.4,
          "confidence": 0.89,
          "class_id": 0,
          "class_name": "pothole"
        }
      ],
      "inference_time_ms": 45.2,
      "num_boxes": 1
    }
  ]
}

Features:

  • Simultaneous inference across all engines for comparison
  • Bounding box coordinates in original image dimensions
  • Confidence scores and class predictions
  • Per-engine latency measurements

Installation and Usage

Prerequisites

  • Python 3.11+
  • CUDA toolkit (optional, for GPU acceleration)
  • Docker (for containerized deployment)

Local Setup

  1. Clone the repository
git clone https://github.com/yourusername/pothole-detection.git
cd pothole-detection
  1. Install dependencies
cd server
pip install -r requirements.txt
pip install ultralytics==8.3.228 --no-deps
  1. Run the server
uvicorn app.main:app --host 0.0.0.0 --port 8000

Docker Deployment

  1. Build the image
cd server
docker build -t pothole-detector .
  1. Run the container
docker run -p 8000:8000 pothole-detector

Making Predictions

Using cURL:

curl -X POST "http://localhost:8000/predict/image" \
  -F "file=@path/to/road_image.jpg"

Using Python:

import requests

with open("road_image.jpg", "rb") as f:
    response = requests.post(
        "http://localhost:8000/predict/image",
        files={"file": f}
    )
    
predictions = response.json()
print(f"Detected {predictions['results'][0]['num_boxes']} potholes")

Training Pipeline

Data Preprocessing

Before training, run the data cleaning notebook to consolidate and standardize all datasets:

  1. Configure Kaggle API:

    • Place your kaggle.json in ~/.kaggle/
    • Set appropriate permissions: chmod 600 ~/.kaggle/kaggle.json
  2. Run Data Cleaning: Open and execute pipeline/notebooks/data_cleaning.ipynb to:

    • Download datasets from Kaggle
    • Convert XML annotations to YOLO format
    • Standardize naming conventions
    • Integrate negative samples
    • Create unified dataset structure

    This produces:

    pothole_dataset/
    ├── images/
    │   └── train/          # ~3,000+ standardized images
    └── labels/
        └── train/          # Corresponding YOLO annotations
    

Preparing Your Dataset

  1. Organize your dataset:
pothole_dataset/
├── images/
│   └── train/
└── labels/
    └── train/

Note: If using the provided data cleaning pipeline, this structure is created automatically.

  1. Generate stratified folds:
python pipeline/scripts/create_folds_colab.py

Training Individual Folds

Train each fold to evaluate model performance:

python pipeline/scripts/train_fold.py --fold 1
python pipeline/scripts/train_fold.py --fold 2
# ... repeat for all folds

Training Final Model

After cross-validation, train the final model:

python pipeline/scripts/train_full.py

This automatically:

  • Identifies the best performing fold
  • Uses its weights as initialization
  • Trains on the complete dataset for extended epochs

Performance Considerations

Inference Speed Comparison

Different engines offer varying speed-accuracy tradeoffs:

  • Original PyTorch: Highest accuracy, moderate speed
  • Pruned PyTorch: Reduced model size, faster inference, minimal accuracy loss
  • ONNX: Cross-platform optimization, good CPU performance
  • OpenVINO: Best CPU performance, ideal for Intel hardware

Model Optimization Benefits

The pruned model typically achieves:

  • 30-50% reduction in model size
  • 20-40% faster inference time
  • <2% reduction in mAP

Technical Details

Non-Maximum Suppression (NMS)

Custom NMS implementation for ONNX and OpenVINO engines:

  • IoU Threshold: 0.45 (configurable)
  • Confidence Threshold: 0.25 (configurable)
  • Efficiently removes duplicate detections while preserving high-confidence predictions

Data Augmentation

Training employs YOLOv8's built-in augmentation:

  • Random horizontal flips
  • Random scaling and translation
  • HSV color space adjustments
  • Mosaic augmentation for multi-scale learning

Future Enhancements

  • Video processing
  • Reporting to Waze maps for warning other drivers
  • Mobile deployment (NCNN)
  • Severity classification (mild, moderate, severe)
  • Integration with GPS for location mapping
  • Dataset update
  • Create a custom PyTorch model from the pruned model (layers input/output shapes reduction), producing a smaller network with noticeable memory and inference savings.

Dependencies

Core Libraries:

  • ultralytics==8.3.228 - YOLOv8 implementation
  • fastapi==0.121.2 - API framework
  • onnxruntime - ONNX inference
  • openvino - Intel inference engine
  • opencv-python-headless - Image processing
  • scikit-learn - Stratified splitting

See server/requirements.txt for complete dependencies.

Acknowledgments

  • Built on the YOLOv8 architecture by Ultralytics

About

Automatically identify and localize potholes in road images, providing critical infrastructure monitoring capabilities. The project encompasses the complete machine learning lifecycle from data preprocessing and model training to optimized deployment.

Topics

Resources

Stars

Watchers

Forks