ViStoryBench introduces a comprehensive and diverse benchmark for story visualization, enabling thorough evaluation of models across narrative complexity, character consistency, and visual style.
vistorybench-demo.mp4
- [2026] 🏆 Ongoing leaderboard maintenance and evaluation of new story visualization methods.
- [2025.12.19] 📄 arXiv v4 is now available, more recent models are evaluated, including NanoBanana-Pro.
- [2025.08.19] 🛠️ Major code v1 update: Full benchmark implementation released.
- [2025.08.12] 📄 arXiv v3 is now available.
- [2025.06.25] 📄 arXiv v2 has been published.
- [2025.05.30] 📝 Technical report v1 released on arXiv.
- [2025.05.21] 🚀 Initial project launch and code release.
ViStoryBench is designed with a modular and extensible architecture. The core of our evaluation pipeline is the BaseEvaluator abstract class, which allows for easy integration of new evaluation metrics.
Adding a New Evaluator:
- Create a new class that inherits from
vistorybench.bench.base_evaluator.BaseEvaluator. - Implement the required methods (
__init__,evaluate). - Register your new evaluator in
vistorybench/bench_run.py.
We welcome contributions from the community! If you have a new metric or an improvement, please feel free to submit a pull request.
git clone --recursive https://github.com/ViStoryBench/vistorybench.git
cd vistorybenchconda create -n vistorybench python=3.11
conda activate vistorybench
# for cuda 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# for cuda 12.1
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
# for cuda 11.8
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cnChoose the torch version that suits you on this website: https://pytorch.org/get-started/previous-versions/
80 stories and 344 characters, both Chinese and English,
Each story included Plot Correspondence, Setting Description, Shot Perspective Design, On-Stage Characters and Static Shot Description
Each character included at least one inference image and corresponding prompt description.
We provide an automated dataset download script that allows you to download full ViStory Dataset with a single command:
cd ViStoryBench
sh download_dataset.shAlternatively, you can download it by following these steps:
📥 Download our ViStory Datasets (🤗huggingface)
and save it in data/dataset.
- If you use a custom path, please full
dataset_pathinvistorybench/config.yaml.
After the download is complete, please rename the ViStoryBench folder to ViStory.
Folder structure of ViStory Datasets:
data/dataset/
├── ViStory/ # rename ‘ViStoryBench’ to ‘ViStory’
│ ├── 01/
│ │ ├── image/
│ │ │ └── Big Brown Rabbit/
│ │ │ ├── 00.jpg
│ │ │ └── ...
│ │ └── story.json
│ └── 02/
│ └── ...
└── ...
Use our standardized loading script
dataset_load.py or your own data loader.
Run this command to verify successful dataset loading:
python vistorybench/dataset_loader/dataset_load.pyAligning the dataset format with the specified method's input requirements.
Pre-built dataset conversion scripts are available for several pre-defined methods, all located in vistorybench/dataset_loader. adapt_base.py is a template script for dataset conversion. All pre-built dataset conversion scripts are created based on this template.
adapt_base.pyadapt2animdirector.pyadapt2seedstory.py,adapt2storyadapter.py,adapt2storydiffusion.py,adapt2storygen.py,adapt2uno.py,adapt2vlogger.py,
Example of UNO:
python vistorybench/dataset_loader/adapt2uno.py \
--language 'en' # choice=['en','ch']You can create a script to convert the ViStory/ViStory-lite dataset into your method's required input format (Based on template script adapt_base.py).
- The converted dataset will be saved to
data/dataset_processed. - If you use a custom path, please full
processed_dataset_pathinvistorybench/config.yaml
Pre-modify inference scripts of several pre-defined methods are available for reference, all located in vistorybench/data_process/inference_custom.
movieagent/run_custom.pyseedstory/vis_custom_sink.pystoryadapter/run_custom.pystorydiffusion/gradio_app_sdxl_specific_id_low_vram_custom.pystorygen/inference_custom_mix.pystorygen/inference_custom.pyuno/inference_custom.pyvlogger/vlog_read_script_sample_custom.py
You can modify your method's story visualization inference scripts according to the specified requirements.
- We suggest saving generated results to
data/outputs. - If you use a custom path, please full
outputs_pathinvistorybench/config.yaml
SD Embed. Our ViStory Dataset contains extensive complex text descriptions. However, not all models support long-text inputs. To overcome the 77-token prompt limitation in Stable Diffusion, we utilize sd_embed to generate long-weighted prompt embeddings for lengthy text.
Make sure your generated results are organized according to the following folder structure:
data/outputs/
├── method_name/
│ └── mode_name/
│ └── language_name/
│ └── timestamp/
│ ├── shots/
│ ├── story_id/
│ │ ├── shot_XX.png
│ │ └── ...
│ └── 02/
│ └── ...
└── method_2/
└── ...
method_name: The model used (e.g., StoryDiffusion, UNO, GPT4o, etc.)mode_name: The mode used of method(e.g., base, SD3, etc.)language_name: The language used (e.g., en, ch)timestamp: Generation run timestamp (YYYYMMDD_HHMMSS) (e.g., 20250000_111111)story_id: The story identifier (e.g., 01, 02, etc.)shot_XX.jpg: Generated image for the shot
Example of UNO:
data/outputs/
├── uno/
│ └── base/
│ └── en/
│ └── 20250000-111111/
│ ├── shots/
│ ├── 01/
│ │ ├── 00.png
│ │ └── ...
│ └── 02/
│ └── ...
└── method_2/
└── ...
Example of your method:
data/outputs/
├── method_1/
│ └── mode_1/
│ └── language_1/
│ └── 20250000-111111/
│ ├── 01/
│ │ ├── shots
│ │ ├── 00.png
│ │ └── ...
│ └── 02/
│ └── ...
└── method_2/
└── ...
When you run the evaluation code, it will automatically perform data reading (ensure both the ViStoryBench dataset and the generated results conform to the standard directory structure specified above). The generated-results reading code has been uniformly integrated into the following file:
vistorybench/dataset_loader/read_outputs.py
We provide an automated pretrain-weight download script that allows you to download all the following weights with a single command.
sh download_weights.sh- All of them will be saved in
data/pretrain. - If you use a custom path, please full
pretrain_pathinvistorybench/config.yaml.
Alternatively, you can download them separately by following these steps:
- a. GroundingDINO weights. Download
groundingdino_swint_ogc.pthweights from here. Save it in thedata/pretrain/groundingdino/weightsfolder (Please create it in advance).
-
b. InsightFace antelopev2. Download
antelopev2.zipfrom here. Unzip it and save them in thedata/pretrain/insightface/models/antelopev2folder (Please create it in advance). -
c. SigLIP weights. Download siglip-so400m-patch14-384 🤗 weights. Save them in the
/data/pretrain/google/siglip-so400m-patch14-384folder (Please create it in advance).
- d. BERT weights. Download bert-base-uncased 🤗 weights.
Save them in the
data/pretrain/google-bert/bert-base-uncasedfolder (Please create it in advance).
-
e. AdaFace weights. Download
adaface_ir101_webface12m.ckptweights from here. Save it in the/data/pretrain/adafacefolder (Please create it in advance). -
f. Facenet vggface2. Download
vggface2automatically during initial execution. -
g. Facexlib weights. Download
detection_Resnet50_Final.pthfrom here to.../facexlib/weights/detection_Resnet50_Final.pthandparsing_parsenet.pthfrom here to.../facexlib/weights/parsing_parsenet.pthautomatically during initial execution.
- CSD weights. Download
csd_vit-large.pthweights from here. Save it in the/data/pretrain/csdfolder (Please create it in advance).
- Aesthetic predictor weights. Download
aesthetic_predictor_v2_5.pthweights from here. Save it in the/data/pretrain/aesthetic_predictorfolder (Please create it in advance).
- Inception weights. Download
inception_v3_google-0cc3c7bd.pthautomatically during initial execution.
Navigate to the source code directory:
cd vistorybenchExample Command:
# Run all available metrics for the 'uno' method in English
python bench_run.py --method uno --language en
# Run only the CIDS and CSD metrics for 'uno'
python bench_run.py --method uno --metrics cids csd --language en
# Run all metrics for a specific timestamp
python bench_run.py --method uno --language en --timestamp 20250824_141800--method(Required): Specify the method (model) to evaluate (e.g.,uno,storydiffusion).--metrics(Optional): A space-separated list of metrics to run (e.g.,cids,csd,aesthetic). If omitted, all registered evaluators will be executed.--language(Required): The language of the dataset to evaluate (enorch).--timestamp(Optional): Specify a particular generation run to evaluate. If omitted, the latest run will be used.--mode(Optional): Mode name for the method (e.g.,base). Used to locate outputs underoutputs/<method>/<mode>/<language>/<timestamp>/.--resume(Optional, default: True):- True: Use the specified
--timestampor the latest available timestamp. - False: Create a new timestamp under results and write evaluation outputs there.
- True: Use the specified
--dataset_path,--outputs_path,--pretrain_path,--result_path(Optional): Override core paths for dataset, generated outputs, pretrain weights, and evaluation results. Defaults come from YAMLcore.pathsor built-in defaults.--api_key(Optional): API key for PromptAlign GPT calls. If omitted, the evaluator reads from the environment variableVISTORYBENCH_API_KEY.--base_url,--model_id(Optional): Override PromptAlign GPT endpoint and model ID for this run. These values override YAMLevaluators.prompt_align.gpt.base_url/modelat runtime.
Note:
- Minimal YAML must include
core.runtime.device(e.g.,cudaorcpu). - PromptAlign optional config lives under
evaluators.prompt_align.gpt(model,base_url). CLI--model_id/--base_urloverride these per-run without changing the YAML.
Your config.yaml should be minimal and explicit. At minimum, specify device under core.runtime. Paths can remain defaults or be customized here.
core:
paths:
dataset: data/dataset
outputs: data/outputs
pretrain: data/pretrain
results: data/bench_results
runtime:
device: cuda # or cpu
# Optional: CIDS knobs (uncomment to override defaults)
# cids:
# ref_mode: origin
# use_multi_face_encoder: false
# ensemble_method: average
# detection:
# dino:
# box_threshold: 0.25
# text_threshold: 0.25
# encoders:
# clip:
# model_id: openai/clip-vit-large-patch14
# matching:
# superfluous_threshold: 0.8
# topk_per_nochar: 5
# ensemble_weights:
# arcface: 0.4
# adaface: 0.4
# facenet: 0.2Notes:
- core.runtime.device is required at runtime; the tool will exit if missing.
- PromptAlign API key is read from env var VISTORYBENCH_API_KEY or via --api_key.
- CLI --base_url and --model_id override evaluators.prompt_align.gpt.base_url/model per run without changing YAML.
Create and Configure the .env File
For security best practices, your API key should be stored in an environment file rather than being hardcoded. The ViStoryBench project is designed to read this sensitive information from a .env file.
-
Copy the
.env.exampleFileIn the root directory of the project, run the following command to copy the example file. If
.env.exampledoes not exist, simply create a new file named.env.cp .env.example .env
-
Fill in the
.envFileOpen the newly created
.envfile and fill in the following fields. As mentioned in theREADME, the API key is read from theVISTORYBENCH_API_KEYenvironment variable.# .env # The base URL for your API service provider. # This can also be set in config.yaml, but is convenient here. BASE_URL="https://api.openai.com/v1" # The model ID you wish to use for evaluations. # This can also be set in config.yaml. MODEL_ID="gpt-4.1" # [REQUIRED] Your API Key. # Replace "sk-..." with your actual secret key. API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Important Notes:
API_KEYis mandatory. ThePromptAlignevaluator will fail if it cannot find this environment variable.- Do not commit your
.envfile containing the real API key to any public repository like GitHub. The project's.gitignorefile should already be configured to prevent this.
STORY_IMG = ['uno', 'seedstory', 'storygen', 'storydiffusion', 'storyadapter', 'theatergen']
STORY_VIDEO = ['movieagent', 'animdirector', 'vlogger', 'mmstoryagent']
CLOSED_SOURCE = ['gemini', 'gpt4o']
BUSINESS = ['moki', 'morphic_studio', 'bairimeng_ai', 'shenbimaliang', 'xunfeihuiying', 'doubao']After running the evaluation, the results will be stored in the data/bench_results directory with the following structure:
data/bench_results/
└── method_name/
└── mode_name/
└── language_name/
└── YYYYMMDD_HHMMSS/
├── summary.json
├── metadata.json
├── cids/
│ ├── cids_results.json
│ └── ...
├── csd/
│ ├── csd_self_results.json
│ └── csd_cross_results.json
└── ... (other metrics)
summary.json: Contains the averaged scores for all metrics.metadata.json: Stores metadata about the evaluation run (method, timestamp, etc.).- Metric-specific directories (
cids/,csd/, etc.): Contain detailed results for each metric. Here is the concise English version for yourREADME.md:
[2025.12] v1.3.0
- Optimized matching algorithm: switched from Greedy to Hungarian.
- Removed dino topk clip logic.
- Added skip handling for "no story" images.
[2025.11] v1.2.0
- Updated pretrain weight source to
ViStoryBench/VistoryBench_pretrain. - Fixed markdown output paths and copy-paste calculation bugs.
- Added utility functions.
[2025.10] v1.1.0
- Major Refactor:
bench_run.pynow supports auto-discovery and batch evaluation. - Structure Optimization: Moved data loading logic to the
dataset_loaderdirectory. - New Features: Added support for
full/litedataset splits; added OOCM metrics and Single Action Alignment.
[2025.08] v1.0.0
- Initial release of ViStoryBench.
The evaluation code for ViStoryBench is released under the Apache 2.0 License, while the ViStoryBench dataset is distributed under the MIT License.
@article{zhuang2025vistorybench,
title={ViStoryBench: Comprehensive Benchmark Suite for Story Visualization},
author={Cailin Zhuang, Ailin Huang, Yaoqi Hu, Jingwei Wu, Wei Cheng, Jiaqi Liao, Hongyuan Wang, Xinyao Liao, Weiwei Cai, Hengyuan Xu, Xuanyang Zhang, Xianfang Zeng, Zhewei Huang, Gang Yu, Chi Zhang},
journal={arXiv preprint arxiv:2505.24862},
year={2025}
}

