The Pothole Detection System is designed to automatically identify and localize potholes in road images, providing critical infrastructure monitoring capabilities. The project encompasses the complete machine learning lifecycle from data preprocessing and model training to optimized deployment.
- YOLOv8-based Detection: Leverages state-of-the-art object detection architecture for accurate pothole localization
- Stratified K-Fold Cross-Validation: Ensures robust model evaluation and generalization
- Multi-Engine Inference: Supports PyTorch, ONNX, and OpenVINO for flexible deployment
- Model Optimization: Implements pruning techniques to reduce model size and inference latency
- Production-Ready API: FastAPI server with comprehensive bounding box predictions and performance metrics
- Containerized Deployment: Docker support for seamless deployment across environments
pothole_detection/
├── pipeline/ # Training and data preparation pipeline
│ ├── notebooks/
│ │ ├── data_cleaning.ipynb # Data preprocessing and exploration
│ │ └── training.ipynb # Model training experiments
│ └── scripts/
│ ├── create_folds_colab.py # K-fold split generation with stratification
│ ├── create_folds_kaggle.py
│ ├── train_fold.py # Single fold training script
│ ├── train_full.py # Final model training on best fold
│ └── summarize_cv.py # Cross-validation results aggregation
│
└── server/ # Inference server and deployment
├── app/
│ ├── main.py # FastAPI application entry point
│ ├── engines/ # Inference engine implementations
│ │ ├── pytorch_engine.py # Original and pruned PyTorch models
│ │ ├── onnx_engine.py # ONNX Runtime inference
│ │ └── openvino_engine.py # Intel OpenVINO inference
│ ├── models/ # Model artifacts
│ │ ├── original.pt # Original YOLOv8 model
│ │ ├── pruned.pt # Pruned model
│ │ ├── model.onnx # ONNX format
│ │ └── model_openvino/ # OpenVINO IR format
│ └── schemas/ # Pydantic response models
├── Dockerfile # Container configuration
└── requirements.txt # Python dependencies
The project consolidates multiple Kaggle datasets into a unified, standardized format suitable for YOLOv8 training. This preprocessing phase is implemented in data_cleaning.ipynb.
Four distinct Kaggle datasets were collected, each with different annotation formats and characteristics:
-
andrewmvd/pothole-detection
- XML annotations (Pascal VOC format)
- High-quality labeled images
- Consistent image quality
-
chitholian/annotated-potholes-dataset
- XML annotations with varying image sizes
- Diverse road conditions and camera angles
- Rich annotation detail
-
rajdalsaniya/pothole-detection-dataset
- Already in YOLOv8 format (text annotations)
- Pre-split train/validation sets
- Good quality but some ambiguous images
-
atulyakumar98/pothole-detection-dataset
- Negative samples (roads without potholes)
- Critical for reducing false positives
- No annotations required (empty label files)
The preprocessing pipeline performs the following transformations:
Step 1: Dataset Download and Caching
- Automated Kaggle dataset download using Kaggle API
- Intelligent caching to Google Drive to avoid redundant downloads
- Organized storage structure for multiple dataset versions
Step 2: Annotation Format Standardization
XML to YOLO Conversion:
# For Pascal VOC (XML) annotations:
# Convert bounding boxes from absolute coordinates to YOLO format
cx = (xmin + xmax) / 2 / image_width
cy = (ymin + ymax) / 2 / image_height
w = (xmax - xmin) / image_width
h = (ymax - ymin) / image_height
# Output: "class_id cx cy w h"The conversion handles:
- Parsing XML files to extract bounding box coordinates
- Converting absolute pixel coordinates to normalized relative coordinates
- Handling multiple objects per image
- Error handling for malformed annotations
Pre-formatted Dataset Integration:
- Direct copying of datasets already in YOLO format
- Validation of image-label pairing
- Consistency checks for annotation format
Step 3: Image and Label Renaming
- Unified naming convention:
{dataset_source}_{global_index}.{ext} - Examples:
andrewmvd_001.jpg,chitholian_342.txt,rajdalsaniya_789.jpg - Prevents filename collisions across datasets
- Maintains traceability to source dataset
Step 4: Negative Sample Integration
- Inclusion of images without potholes (negative samples)
- Empty annotation files created for negative samples
- Improves model's ability to distinguish roads without defects
- Reduces false positive rate in deployment
Step 5: Dataset Consolidation
- All images copied to unified
images/train/directory - All labels saved to corresponding
labels/train/directory - Automatic directory structure creation
- Progress tracking with tqdm for large datasets
The pipeline implements several quality control measures:
- File Existence Validation: Verifies both image and annotation files exist before processing
- Format Validation: Ensures bounding box coordinates are within valid ranges (0-1)
- Consistency Checks: Confirms image dimensions match annotation references
- Error Logging: Captures and reports parsing errors without halting the entire pipeline
- Progress Monitoring: Real-time feedback on processing status for each dataset
The consolidated dataset includes:
- Total Images: ~3,000+ road images from diverse sources
- Positive Samples: Images with one or more potholes (varying density)
- Negative Samples: Clean road images for false positive reduction
- Annotation Format: YOLOv8 normalized coordinates (class_id, cx, cy, w, h)
- Class Distribution: Single class ("pothole") with varying instance counts per image
┌─────────────────────────────────────────────────────────────────┐
│ KAGGLE DATASETS │
├─────────────────────────────────────────────────────────────────┤
│ andrewmvd (XML) │ chitholian (XML) │ rajdalsaniya (YOLO) │
│ atulyakumar98 (Negative samples) │
└────────────┬────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PREPROCESSING PIPELINE │
├─────────────────────────────────────────────────────────────────┤
│ 1. Download & Cache to Google Drive │
│ 2. Parse XML → Extract bounding boxes │
│ 3. Convert to YOLO format (normalized coordinates) │
│ 4. Rename: {dataset}_{id}.{ext} │
│ 5. Copy pre-formatted YOLO data │
│ 6. Generate empty labels for negatives │
│ 7. Validate & consolidate │
└────────────┬────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ UNIFIED DATASET │
├─────────────────────────────────────────────────────────────────┤
│ pothole_dataset/ │
│ ├── images/train/ ← All images (standardized names) │
│ └── labels/train/ ← YOLO annotations (.txt files) │
└─────────────────────────────────────────────────────────────────┘
The training pipeline implements stratified K-fold cross-validation to ensure balanced representation across folds:
- Stratification Strategy: Images are grouped based on pothole count (0, 1-3, 3-6, 6+ potholes)
- Fold Generation: 5-fold cross-validation with stratified splitting ensures each fold maintains similar class distributions
- Benefit: Prevents overfitting to specific data characteristics and provides robust performance estimates
- Each fold is trained independently using YOLOv8n (nano) architecture
- Training configuration:
- Epochs: 100 per fold
- Image Size: 640×640 pixels
- Batch Size: 16
- Early Stopping: Patience of 20 epochs
- Data Augmentation: Enabled for improved generalization
- Best performing fold (based on mAP@0.5:0.95) is selected
- Final model trained for 200 epochs on full train/val split
- Achieves optimal balance between accuracy and inference speed
The project implements multiple inference engines for different deployment scenarios:
- Original Model: Full YOLOv8n model with complete architecture
- Pruned Model: Optimized version with reduced parameters while maintaining accuracy
- Cross-platform inference with hardware acceleration support
- Optimized computation graph for faster inference
- Custom non-maximum suppression (NMS) implementation
- Intel-optimized inference engine
- Particularly efficient on CPU architectures
- Ideal for edge deployment scenarios
The FastAPI server provides a unified interface for all inference engines:
Endpoint: POST /predict/image
Request:
- Multipart form-data with image file
- Supports common image formats (JPEG, PNG, BMP)
Response:
{
"results": [
{
"engine": "original",
"boxes": [
{
"x1": 120.5,
"y1": 85.3,
"x2": 250.8,
"y2": 180.4,
"confidence": 0.89,
"class_id": 0,
"class_name": "pothole"
}
],
"inference_time_ms": 45.2,
"num_boxes": 1
}
]
}Features:
- Simultaneous inference across all engines for comparison
- Bounding box coordinates in original image dimensions
- Confidence scores and class predictions
- Per-engine latency measurements
- Python 3.11+
- CUDA toolkit (optional, for GPU acceleration)
- Docker (for containerized deployment)
- Clone the repository
git clone https://github.com/yourusername/pothole-detection.git
cd pothole-detection- Install dependencies
cd server
pip install -r requirements.txt
pip install ultralytics==8.3.228 --no-deps- Run the server
uvicorn app.main:app --host 0.0.0.0 --port 8000- Build the image
cd server
docker build -t pothole-detector .- Run the container
docker run -p 8000:8000 pothole-detectorUsing cURL:
curl -X POST "http://localhost:8000/predict/image" \
-F "file=@path/to/road_image.jpg"Using Python:
import requests
with open("road_image.jpg", "rb") as f:
response = requests.post(
"http://localhost:8000/predict/image",
files={"file": f}
)
predictions = response.json()
print(f"Detected {predictions['results'][0]['num_boxes']} potholes")Before training, run the data cleaning notebook to consolidate and standardize all datasets:
-
Configure Kaggle API:
- Place your
kaggle.jsonin~/.kaggle/ - Set appropriate permissions:
chmod 600 ~/.kaggle/kaggle.json
- Place your
-
Run Data Cleaning: Open and execute
pipeline/notebooks/data_cleaning.ipynbto:- Download datasets from Kaggle
- Convert XML annotations to YOLO format
- Standardize naming conventions
- Integrate negative samples
- Create unified dataset structure
This produces:
pothole_dataset/ ├── images/ │ └── train/ # ~3,000+ standardized images └── labels/ └── train/ # Corresponding YOLO annotations
- Organize your dataset:
pothole_dataset/
├── images/
│ └── train/
└── labels/
└── train/
Note: If using the provided data cleaning pipeline, this structure is created automatically.
- Generate stratified folds:
python pipeline/scripts/create_folds_colab.pyTrain each fold to evaluate model performance:
python pipeline/scripts/train_fold.py --fold 1
python pipeline/scripts/train_fold.py --fold 2
# ... repeat for all foldsAfter cross-validation, train the final model:
python pipeline/scripts/train_full.pyThis automatically:
- Identifies the best performing fold
- Uses its weights as initialization
- Trains on the complete dataset for extended epochs
Different engines offer varying speed-accuracy tradeoffs:
- Original PyTorch: Highest accuracy, moderate speed
- Pruned PyTorch: Reduced model size, faster inference, minimal accuracy loss
- ONNX: Cross-platform optimization, good CPU performance
- OpenVINO: Best CPU performance, ideal for Intel hardware
The pruned model typically achieves:
- 30-50% reduction in model size
- 20-40% faster inference time
- <2% reduction in mAP
Custom NMS implementation for ONNX and OpenVINO engines:
- IoU Threshold: 0.45 (configurable)
- Confidence Threshold: 0.25 (configurable)
- Efficiently removes duplicate detections while preserving high-confidence predictions
Training employs YOLOv8's built-in augmentation:
- Random horizontal flips
- Random scaling and translation
- HSV color space adjustments
- Mosaic augmentation for multi-scale learning
- Video processing
- Reporting to Waze maps for warning other drivers
- Mobile deployment (NCNN)
- Severity classification (mild, moderate, severe)
- Integration with GPS for location mapping
- Dataset update
- Create a custom PyTorch model from the pruned model (layers input/output shapes reduction), producing a smaller network with noticeable memory and inference savings.
Core Libraries:
ultralytics==8.3.228- YOLOv8 implementationfastapi==0.121.2- API frameworkonnxruntime- ONNX inferenceopenvino- Intel inference engineopencv-python-headless- Image processingscikit-learn- Stratified splitting
See server/requirements.txt for complete dependencies.
- Built on the YOLOv8 architecture by Ultralytics