YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Guanning Zeng · Xiang Zhang · Zirui Wang · Haiyang Xu · Zeyuan Chen · Bingnan Li · Zhuowen Tu

ICCV 2025

Paper | arXiv

This repository contains the official implementation of YOLO-Count, a fully differentiable and open-vocabulary object counting model. YOLO-Count is designed to provide accurate object count estimation and enable fine-grained quantity control for text-to-image (T2I) generation models.

Environment Preparation

We recommend using Conda to set up the environment.

conda create -n yolocnt python=3.12
conda activate yolocnt
pip install -r requirements.txt

Dataset Preparation

YOLO-Count is trained and evaluated on multiple object counting benchmarks. Please download and organize each dataset as follows.

FSC147

Download FSC147 from
https://github.com/cvlab-stonybrook/LearningToCountEverything

Place the following folders under:

data/FSC/
├── gt_density_map_adaptive_384_VarV2
└── images_384_VarV2

Open Images v7 (OImgv7)

Download Open Images v7 using:

python -m scripts.download_oimgv7

Objects365 (Obj365)

Download the validation images with:

python -m scripts.download_o365

Then organize the data as:

data/Obj365/objects365/val

LVIS

Download LVIS
Place all files under:
```
data/LVIS/
```

Pre-trained Weights

Pre-trained model weights are available at
https://huggingface.co/zx1239856/yolo-count/tree/main

Please download the weights and place them in the checkpoints/ directory.

Evaluation

Evaluation can be performed using the eval_*.py scripts in the scripts folder.
For example, to evaluate on FSC147:

python -m scripts.eval_fsc

Results

The table below reports counting performance using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

Dataset	Split	MAE	RMSE
FSC	Test	15.6745	96.3807
FSC	Validation	14.8297	59.6979
LVIS	Validation	1.5379	5.6076
OImgv7	Validation	3.7087	12.0285
Obj365	Validation	3.2749	9.2181

Citation

If you find this work useful in your research, please consider citing:

@InProceedings{zeng2025yolocount,
    author    = {Zeng, Guanning and Zhang, Xiang and Wang, Zirui and Xu, Haiyang and Chen, Zeyuan and Li, Bingnan and Tu, Zhuowen},
    title     = {YOLO-Count: Differentiable Object Counting for Text-to-Image Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {16765--16775}
}

License

This repository is released under the CC-BY-SA 4.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
figures		figures
scripts		scripts
yolo_count		yolo_count
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Paper | arXiv

Environment Preparation

Dataset Preparation

FSC147

Open Images v7 (OImgv7)

Objects365 (Obj365)

LVIS

Pre-trained Weights

Evaluation

Results

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

mlpc-ucsd/YOLO-Count

Folders and files

Latest commit

History

Repository files navigation

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Paper | arXiv

Environment Preparation

Dataset Preparation

FSC147

Open Images v7 (OImgv7)

Objects365 (Obj365)

LVIS

Pre-trained Weights

Evaluation

Results

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages