📷DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

Hongfei Zhang ^1*, Kanghao Chen ^1,5*, Zixin Zhang ^1,5, Harold H. Chen ^1,5, Yuanhuiyi Lyu ¹, Yuqi Zhang ³, Shuai Yang ¹, Kun Zhou ⁴, Ying-Cong Chen ^1,2,✉

¹ HKUST(GZ) ² HKUST ³ Fudan University ⁴ Shenzhen University ⁵ Knowin

* Equal Contribution. ✉Corresponding author.

open-world.mp4

📰 News

✅ 2025.12 — Released training code (Paper setting) 🚀

✅ 2025.11 — Adding Gradio Demo ✨

✅ 2025.11 — Released inference pipeline & demo dataset ✔️

✅ 2025.11 — Uploaded official DualCamCtrl checkpoints to HuggingFace 🔑

⚙️ TODO

⬜ Release the training code (For lora fine-tuning) 🚀

🎯 Overview

Abstract

This paper presents DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation. Recent works have advanced this field by representing camera poses as ray-based conditions, yet they often lack sufficient scene understanding and geometric awareness. DualCamCtrl specifically targets this limitation by introducing a dual-branch framework that mutually generates camera-consistent RGB and depth sequences. To harmonize these two modalities, we further propose the SemantIc Guided Mutual Alignment (SIGMA) mechanism, which performs RGB–depth fusion in a semantics-guided and mutually reinforced manner. These designs collectively enable DualCamCtrl to better disentangle appearance and geometry modeling, generating videos that more faithfully adhere to the specified camera trajectories. Extensive experiments demonstrate that DualCamCtrl achieves more consistent camera-controlled video generation with over 40% reduction on camera motion errors compared with prior methods.

Results

Comparison between our method and other state-of-the-art approaches. Given the same camera pose and input image as generation conditions, our method achieves the best alignment between camera motion and scene dynamics, producing the most visually accurate video. The ’+’ signs marked in the figure serve as anchors for visual comparison.

Quantitative comparisons on I2V setting. ↑ / ↓ denotes higher/lower is better. Best and second best results highlighted.

Quantitative comparisons on T2V setting across REALESTATE10K and DL3DV.

🔧 Installation

Clone repo and create an enviroment with Python 3.11:

git clone https://github.com/soyouthinkyoucantell/DualCamCtrl.git
conda create -n dualcamctrl python=3.11 -y
conda activate dualcamctrl

Install DiffSynth-Studio dependencies from source code:

cd DualCamCtrl
pip install -e .

Then install GenFusion dependencies:

mkdir dependency
cd dependency
git clone https://github.com/rmbrualla/pycolmap.git 
cd pycolmap
pip install -e .
pip install numpy==1.26.4 peft accelerate==1.9.0 decord==0.6.0 deepspeed diffusers omegaconf

🔮 Inference

How to Get Input Depth (For Inference)

We utilize Video Depth Anything (VDA) to predict both image depth and video depth. For inference, only image depth is needed, so you could run VDA with a single image input.

Checkpoints

Get the checkpoints from the HuggingFace repo: DualCamCtrl Checkpoints and put it the checkpoints dir (You could just simply download the HF repo which contains a dir named 'checkpoints'.)

Your project structure should be like:

DualCamCtrl/
├── checkpoints/                 # ← Put downloaded weight file here
│   └── dualcamctrl_diffusion_transformer.pt
├── demo_dataset/                # Small demo dataset strcture
├── demo_pic/                    # Demo images for quick inference
├── diffsynth/                   
├── examples/   
├── ....                 
├── requirements.txt             
├── README.md                    
└── setup.py

Test with our demo pictures and depth:

cd .. # make sure you are at the root dir 
export PYTHONPATH=.
python -m test_script.test_demo

✨ Gradio

Install gradio dependency (needs large memory GPU)

pip install gradio

Run app

export PYTHONPATH=.
python gradio/app.py # For Large Memory GPU

🔥 Training

Please refer to this document for training.

Citation

If you find our work useful, please consider cite our work:

@article{zhang2025dualcamctrl,
  title={DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation},
  author={Zhang, Hongfei and Chen, Kanghao and Zhang, Zixin and Chen, Harold Haodong and Lyu, Yuanhuiyi and Zhang, Yuqi and Yang, Shuai and Zhou, Kun and Chen, Yingcong},
  journal={arXiv preprint arXiv:2511.23127},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📷DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

📰 News

⚙️ TODO

🎯 Overview

Abstract

Results

🔧 Installation

Clone repo and create an enviroment with Python 3.11:

Install DiffSynth-Studio dependencies from source code:

Then install GenFusion dependencies:

🔮 Inference

How to Get Input Depth (For Inference)

Checkpoints

Test with our demo pictures and depth:

✨ Gradio

🔥 Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 663 Commits
assets		assets
demo_dataset		demo_dataset
demo_pic		demo_pic
diffsynth		diffsynth
examples		examples
gradio		gradio
hf		hf
model_config		model_config
test_script		test_script
train_config		train_config
train_script		train_script
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.py		setup.py

EnVision-Research/DualCamCtrl

Folders and files

Latest commit

History

Repository files navigation

📷DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

📰 News

⚙️ TODO

🎯 Overview

Abstract

Results

🔧 Installation

Clone repo and create an enviroment with Python 3.11:

Install DiffSynth-Studio dependencies from source code:

Then install GenFusion dependencies:

🔮 Inference

How to Get Input Depth (For Inference)

Checkpoints

Test with our demo pictures and depth:

✨ Gradio

🔥 Training

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages