The goal of this demo is to demonstrate a complete and practical computer vision pipeline, including:
- Multi-object fashion detection
- Improved recall for small objects (rings, watches, accessories)
- Multi-label attribute prediction per detected object
- Clean, standardized output format for downstream usage
This demo is intended as a proof-of-concept system.
- YOLOv11-based detection
- Supports apparel and accessory categories
- GPU acceleration via CUDA when available
- Optional SAHI inference for small objects
- Toggleable directly from the UI
- Helps recover tiny fashion items often missed by standard inference
- Each detected object is cropped and passed to an attribute head
- Multi-label prediction (one object β multiple attributes)
- Attribute IDs are mapped to human-readable names
- Visual output with bounding boxes
- Tabular summary (class, confidence, attributes)
- JSON output matching submission format
Example JSON output:
[
{
"label": "Cardigan",
"confidence": 0.97,
"box": [100, 175, 715, 971],
"attributes": ["Plain pattern", "Short length", "Single breasted"]
}
]