Pothole Detection System

Project Overview

The Pothole Detection System is designed to automatically identify and localize potholes in road images, providing critical infrastructure monitoring capabilities. The project encompasses the complete machine learning lifecycle from data preprocessing and model training to optimized deployment.

Key Features

YOLOv8-based Detection: Leverages state-of-the-art object detection architecture for accurate pothole localization
Stratified K-Fold Cross-Validation: Ensures robust model evaluation and generalization
Multi-Engine Inference: Supports PyTorch, ONNX, and OpenVINO for flexible deployment
Model Optimization: Implements pruning techniques to reduce model size and inference latency
Production-Ready API: FastAPI server with comprehensive bounding box predictions and performance metrics
Containerized Deployment: Docker support for seamless deployment across environments

Project Structure

pothole_detection/
├── pipeline/                      # Training and data preparation pipeline
│   ├── notebooks/
│   │   ├── data_cleaning.ipynb   # Data preprocessing and exploration
│   │   └── training.ipynb        # Model training experiments
│   └── scripts/
│       ├── create_folds_colab.py # K-fold split generation with stratification
│       ├── create_folds_kaggle.py
│       ├── train_fold.py          # Single fold training script
│       ├── train_full.py          # Final model training on best fold
│       └── summarize_cv.py        # Cross-validation results aggregation
│
└── server/                        # Inference server and deployment
    ├── app/
    │   ├── main.py               # FastAPI application entry point
    │   ├── engines/              # Inference engine implementations
    │   │   ├── pytorch_engine.py # Original and pruned PyTorch models
    │   │   ├── onnx_engine.py    # ONNX Runtime inference
    │   │   └── openvino_engine.py # Intel OpenVINO inference
    │   ├── models/               # Model artifacts
    │   │   ├── original.pt       # Original YOLOv8 model
    │   │   ├── pruned.pt         # Pruned model
    │   │   ├── model.onnx        # ONNX format
    │   │   └── model_openvino/   # OpenVINO IR format
    │   └── schemas/              # Pydantic response models
    ├── Dockerfile                # Container configuration
    └── requirements.txt          # Python dependencies

Methodology

1. Data Collection and Preprocessing

The project consolidates multiple Kaggle datasets into a unified, standardized format suitable for YOLOv8 training. This preprocessing phase is implemented in data_cleaning.ipynb.

Dataset Sources

Four distinct Kaggle datasets were collected, each with different annotation formats and characteristics:

andrewmvd/pothole-detection
- XML annotations (Pascal VOC format)
- High-quality labeled images
- Consistent image quality
chitholian/annotated-potholes-dataset
- XML annotations with varying image sizes
- Diverse road conditions and camera angles
- Rich annotation detail
rajdalsaniya/pothole-detection-dataset
- Already in YOLOv8 format (text annotations)
- Pre-split train/validation sets
- Good quality but some ambiguous images
atulyakumar98/pothole-detection-dataset
- Negative samples (roads without potholes)
- Critical for reducing false positives
- No annotations required (empty label files)

Data Cleaning Pipeline

The preprocessing pipeline performs the following transformations:

Step 1: Dataset Download and Caching

Automated Kaggle dataset download using Kaggle API
Intelligent caching to Google Drive to avoid redundant downloads
Organized storage structure for multiple dataset versions

Step 2: Annotation Format Standardization

XML to YOLO Conversion:

# For Pascal VOC (XML) annotations:
# Convert bounding boxes from absolute coordinates to YOLO format
cx = (xmin + xmax) / 2 / image_width
cy = (ymin + ymax) / 2 / image_height
w = (xmax - xmin) / image_width
h = (ymax - ymin) / image_height
# Output: "class_id cx cy w h"

The conversion handles:

Parsing XML files to extract bounding box coordinates
Converting absolute pixel coordinates to normalized relative coordinates
Handling multiple objects per image
Error handling for malformed annotations

Pre-formatted Dataset Integration:

Direct copying of datasets already in YOLO format
Validation of image-label pairing
Consistency checks for annotation format

Step 3: Image and Label Renaming

Unified naming convention: {dataset_source}_{global_index}.{ext}
Examples: andrewmvd_001.jpg, chitholian_342.txt, rajdalsaniya_789.jpg
Prevents filename collisions across datasets
Maintains traceability to source dataset

Step 4: Negative Sample Integration

Inclusion of images without potholes (negative samples)
Empty annotation files created for negative samples
Improves model's ability to distinguish roads without defects
Reduces false positive rate in deployment

Step 5: Dataset Consolidation

All images copied to unified images/train/ directory
All labels saved to corresponding labels/train/ directory
Automatic directory structure creation
Progress tracking with tqdm for large datasets

Data Quality Assurance

The pipeline implements several quality control measures:

File Existence Validation: Verifies both image and annotation files exist before processing
Format Validation: Ensures bounding box coordinates are within valid ranges (0-1)
Consistency Checks: Confirms image dimensions match annotation references
Error Logging: Captures and reports parsing errors without halting the entire pipeline
Progress Monitoring: Real-time feedback on processing status for each dataset

Final Dataset Statistics

The consolidated dataset includes:

Total Images: ~3,000+ road images from diverse sources
Positive Samples: Images with one or more potholes (varying density)
Negative Samples: Clean road images for false positive reduction
Annotation Format: YOLOv8 normalized coordinates (class_id, cx, cy, w, h)
Class Distribution: Single class ("pothole") with varying instance counts per image

Data Cleaning Workflow

┌─────────────────────────────────────────────────────────────────┐
│                    KAGGLE DATASETS                              │
├─────────────────────────────────────────────────────────────────┤
│ andrewmvd (XML)  │ chitholian (XML)  │ rajdalsaniya (YOLO)     │
│ atulyakumar98 (Negative samples)                                │
└────────────┬────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│              PREPROCESSING PIPELINE                             │
├─────────────────────────────────────────────────────────────────┤
│ 1. Download & Cache to Google Drive                             │
│ 2. Parse XML → Extract bounding boxes                           │
│ 3. Convert to YOLO format (normalized coordinates)              │
│ 4. Rename: {dataset}_{id}.{ext}                                 │
│ 5. Copy pre-formatted YOLO data                                 │
│ 6. Generate empty labels for negatives                          │
│ 7. Validate & consolidate                                       │
└────────────┬────────────────────────────────────────────────────┘
             │
             ▼
┌─────────────────────────────────────────────────────────────────┐
│              UNIFIED DATASET                                    │
├─────────────────────────────────────────────────────────────────┤
│ pothole_dataset/                                                │
│ ├── images/train/   ← All images (standardized names)          │
│ └── labels/train/   ← YOLO annotations (.txt files)            │
└─────────────────────────────────────────────────────────────────┘

2. Data Preparation and Stratification

The training pipeline implements stratified K-fold cross-validation to ensure balanced representation across folds:

Stratification Strategy: Images are grouped based on pothole count (0, 1-3, 3-6, 6+ potholes)
Fold Generation: 5-fold cross-validation with stratified splitting ensures each fold maintains similar class distributions
Benefit: Prevents overfitting to specific data characteristics and provides robust performance estimates

3. Model Training

Cross-Validation Training

Each fold is trained independently using YOLOv8n (nano) architecture
Training configuration:
- Epochs: 100 per fold
- Image Size: 640×640 pixels
- Batch Size: 16
- Early Stopping: Patience of 20 epochs
- Data Augmentation: Enabled for improved generalization

Final Model Training

Best performing fold (based on mAP@0.5:0.95) is selected
Final model trained for 200 epochs on full train/val split
Achieves optimal balance between accuracy and inference speed

4. Model Optimization

The project implements multiple inference engines for different deployment scenarios:

PyTorch Models

Original Model: Full YOLOv8n model with complete architecture
Pruned Model: Optimized version with reduced parameters while maintaining accuracy

ONNX Runtime

Cross-platform inference with hardware acceleration support
Optimized computation graph for faster inference
Custom non-maximum suppression (NMS) implementation

OpenVINO

Intel-optimized inference engine
Particularly efficient on CPU architectures
Ideal for edge deployment scenarios

5. API Design

The FastAPI server provides a unified interface for all inference engines:

Endpoint: POST /predict/image

Request:

Multipart form-data with image file
Supports common image formats (JPEG, PNG, BMP)

Response:

{
  "results": [
    {
      "engine": "original",
      "boxes": [
        {
          "x1": 120.5,
          "y1": 85.3,
          "x2": 250.8,
          "y2": 180.4,
          "confidence": 0.89,
          "class_id": 0,
          "class_name": "pothole"
        }
      ],
      "inference_time_ms": 45.2,
      "num_boxes": 1
    }
  ]
}

Features:

Simultaneous inference across all engines for comparison
Bounding box coordinates in original image dimensions
Confidence scores and class predictions
Per-engine latency measurements

Installation and Usage

Prerequisites

Python 3.11+
CUDA toolkit (optional, for GPU acceleration)
Docker (for containerized deployment)

Local Setup

Clone the repository

git clone https://github.com/yourusername/pothole-detection.git
cd pothole-detection

Install dependencies

cd server
pip install -r requirements.txt
pip install ultralytics==8.3.228 --no-deps

Run the server

uvicorn app.main:app --host 0.0.0.0 --port 8000

Docker Deployment

Build the image

cd server
docker build -t pothole-detector .

Run the container

docker run -p 8000:8000 pothole-detector

Making Predictions

Using cURL:

curl -X POST "http://localhost:8000/predict/image" \
  -F "file=@path/to/road_image.jpg"

Using Python:

import requests

with open("road_image.jpg", "rb") as f:
    response = requests.post(
        "http://localhost:8000/predict/image",
        files={"file": f}
    )
    
predictions = response.json()
print(f"Detected {predictions['results'][0]['num_boxes']} potholes")

Training Pipeline

Data Preprocessing

Before training, run the data cleaning notebook to consolidate and standardize all datasets:

Configure Kaggle API:
- Place your kaggle.json in ~/.kaggle/
- Set appropriate permissions: chmod 600 ~/.kaggle/kaggle.json
Run Data Cleaning: Open and execute pipeline/notebooks/data_cleaning.ipynb to:
- Download datasets from Kaggle
- Convert XML annotations to YOLO format
- Standardize naming conventions
- Integrate negative samples
- Create unified dataset structure
This produces:
```
pothole_dataset/
├── images/
│   └── train/          # ~3,000+ standardized images
└── labels/
    └── train/          # Corresponding YOLO annotations
```

Preparing Your Dataset

Organize your dataset:

pothole_dataset/
├── images/
│   └── train/
└── labels/
    └── train/

Note: If using the provided data cleaning pipeline, this structure is created automatically.

Generate stratified folds:

python pipeline/scripts/create_folds_colab.py

Training Individual Folds

Train each fold to evaluate model performance:

python pipeline/scripts/train_fold.py --fold 1
python pipeline/scripts/train_fold.py --fold 2
# ... repeat for all folds

Training Final Model

After cross-validation, train the final model:

python pipeline/scripts/train_full.py

This automatically:

Identifies the best performing fold
Uses its weights as initialization
Trains on the complete dataset for extended epochs

Performance Considerations

Inference Speed Comparison

Different engines offer varying speed-accuracy tradeoffs:

Original PyTorch: Highest accuracy, moderate speed
Pruned PyTorch: Reduced model size, faster inference, minimal accuracy loss
ONNX: Cross-platform optimization, good CPU performance
OpenVINO: Best CPU performance, ideal for Intel hardware

Model Optimization Benefits

The pruned model typically achieves:

30-50% reduction in model size
20-40% faster inference time
<2% reduction in mAP

Technical Details

Non-Maximum Suppression (NMS)

Custom NMS implementation for ONNX and OpenVINO engines:

IoU Threshold: 0.45 (configurable)
Confidence Threshold: 0.25 (configurable)
Efficiently removes duplicate detections while preserving high-confidence predictions

Data Augmentation

Training employs YOLOv8's built-in augmentation:

Random horizontal flips
Random scaling and translation
HSV color space adjustments
Mosaic augmentation for multi-scale learning

Future Enhancements

Video processing
Reporting to Waze maps for warning other drivers
Mobile deployment (NCNN)
Severity classification (mild, moderate, severe)
Integration with GPS for location mapping
Dataset update
Create a custom PyTorch model from the pruned model (layers input/output shapes reduction), producing a smaller network with noticeable memory and inference savings.

Dependencies

Core Libraries:

ultralytics==8.3.228 - YOLOv8 implementation
fastapi==0.121.2 - API framework
onnxruntime - ONNX inference
openvino - Intel inference engine
opencv-python-headless - Image processing
scikit-learn - Stratified splitting

See server/requirements.txt for complete dependencies.

Acknowledgments

Built on the YOLOv8 architecture by Ultralytics

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
pipeline		pipeline
server		server
README.md		README.md

NBAmine/Pothole-detection

Folders and files

Latest commit

History

Repository files navigation