Skip to content

(ICCV 2025) YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

License

Notifications You must be signed in to change notification settings

mlpc-ucsd/YOLO-Count

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Guanning Zeng · Xiang Zhang · Zirui Wang · Haiyang Xu · Zeyuan Chen · Bingnan Li · Zhuowen Tu

ICCV 2025

Pipeline

This repository contains the official implementation of YOLO-Count, a fully differentiable and open-vocabulary object counting model. YOLO-Count is designed to provide accurate object count estimation and enable fine-grained quantity control for text-to-image (T2I) generation models.


Environment Preparation

We recommend using Conda to set up the environment.

conda create -n yolocnt python=3.12
conda activate yolocnt
pip install -r requirements.txt

Dataset Preparation

YOLO-Count is trained and evaluated on multiple object counting benchmarks. Please download and organize each dataset as follows.

FSC147

Open Images v7 (OImgv7)

Download Open Images v7 using:

python -m scripts.download_oimgv7

Objects365 (Obj365)

Download the validation images with:

python -m scripts.download_o365

Then organize the data as:

data/Obj365/objects365/val

LVIS

  • Download LVIS
  • Place all files under:
    data/LVIS/
    

Pre-trained Weights

Pre-trained model weights are available at
https://huggingface.co/zx1239856/yolo-count/tree/main

Please download the weights and place them in the checkpoints/ directory.


Evaluation

Evaluation can be performed using the eval_*.py scripts in the scripts folder.
For example, to evaluate on FSC147:

python -m scripts.eval_fsc

Results

The table below reports counting performance using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

Dataset Split MAE RMSE
FSC Test 15.6745 96.3807
FSC Validation 14.8297 59.6979
LVIS Validation 1.5379 5.6076
OImgv7 Validation 3.7087 12.0285
Obj365 Validation 3.2749 9.2181

Citation

If you find this work useful in your research, please consider citing:

@InProceedings{zeng2025yolocount,
    author    = {Zeng, Guanning and Zhang, Xiang and Wang, Zirui and Xu, Haiyang and Chen, Zeyuan and Li, Bingnan and Tu, Zhuowen},
    title     = {YOLO-Count: Differentiable Object Counting for Text-to-Image Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {16765--16775}
}

License

This repository is released under the CC-BY-SA 4.0 license.

About

(ICCV 2025) YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages