Skip to content
/ FICGen Public

Implementation of " FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation"(ICCV2025)

License

Notifications You must be signed in to change notification settings

iCVTEAM/FICGen

Repository files navigation

Implementation of FICGen (ICCV 2025)

FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation

Wenzhuang Wang, Yifan Zhao, Mingcan Ma, Ming Liu, Zhonglin Jiang, Yong Chen, Jia Li
State Key Laboratory of Virtual Reality Technology and Systems, SCSE&QRI, Beihang University, Geely Automobile Research Institute (Ningbo) Co., Ltd

Features

  • Motivation: Layout-to-image (L2I) generation has exhibited promising results in natural domains, but suffers from limited generative fidelity and weak alignment with user-provided layouts when applied to degraded scenes (i.e., low-light, underwater). We primarily attribute these limitations to the "contextual illusion dilemma" in degraded conditions, where foreground instances are overwhelmed by context-dominant frequency distributions.
  • Contribution: We propose a new Frequency-Inspired Contextual Disentanglement Generative (FICGen) paradigm, which seeks to transfer frequency knowledge of degraded images into the latent diffusion space, thereby facilitating the rendering of degraded instances and their surroundings via contextual frequency-aware guidance. we significantly enhance Results (Extensive experiments on 5 benchmarks involving a variety of degraded scenarios—from severe low-light to mild blur—demonstrate that FICGen consistently surpasses existing L2I methods in terms of generative fidelity, alignment and downstream auxiliary trainability.).

News

  • 2025-11-25: 🔥🔥🔥 Training and inference code have been released. Recommanded Hardware: at least 1x NVIDIA GeForce RTX 3090, GPUs Memory: ~24GB GPU memory.

Gallery

Qualitative Comparison attr_control Underwater Controllable Generation quantity_control Adverse Weather Controllable Generation animation_creation

  • 1.Installation

We recommend using Python 3.8.0 and PyTorch with CUDA support. To set up the environment:

# Create a new conda environment
conda create -n ficgen python=3.8.0
conda activate ficgen

# Install other dependencies
pip install -r requirements.txt
pip mmcv-full==1.7.0
cd mmdetection
pip install -e .

2.Training preparation

2.1 Training Data Preparation Take the remote sensing dataset DIOR as an example:

The final dataset should be ordered as follow:

DIOR
├── train
│   ├── 00001.jpg
|   ├── ...
|   ├── metadata.jsonl
├── test
|   ├── 23462.jpg
|   ├── ...
|   ├── metadata.jsonl
├── dior_emb.pt

3. FICGen Training and Inference

Train our FICGen model (Please pay attention to the image paths of the dataset):

bash dist_train_ficgen.sh

Layout-driven degraded image generation (Note to modify the checkpoint path and dataset path):

python inference.py --img_patch_path  img_patch_path_dir --bg_path bg_path_dir

4. Generative Evaluations

4.1Fidelity(FID):

pip install torch-fidelity
fidelity --gpu 0 --fid --input1 real_images_dir --input2 synthetic_images_dir

The resolution of both the real and synthetic images is 512x512. It should be noted that in the process of evaluating the fidelity of remote sensing images, the Inception network has been pre-trained on RSICD. Please refer to AeroGen.

4.2 Alignment (YOLO score, AP):

We use the pre-trained YOLOv8 to calculate the AP between synthetic images (512x512) and ground-truth bboxes (note to modify the image path in calculate_ys.py):

cd eval
pip install ultralytics
python calculate_ys.py

For a comprehensive alignment evaluation, we utilize the pre-trained Faster R-CNN (R50) in mmdetection2.25.3 (note to modify the test image path in coco_detection.py):

cd mmdetection
python tools/test.py configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py work_dirs/faster_rcnn_r50_fpn_1x_dior/latest.pth --eval bbox

When using MMDetection for alignment evaluation, ensure that synthetic images are resized to the same resolution as their corresponding real test images. For other datasets such as ExDARK and RUOD, the alignment evaluation methods are similar. Pay attention to the correctness of the synthetic image paths.

4.3 Trainability (downstream AP performance):

(1) We first apply augmentation to the ground-truth bboxes (e.g., train/metadata.jsonl) in the training set, including random flip, translation, and scaling, and then combine the original training annotations (instances_train.json) to generate new annotation files (instances_train_mix.json) in mmdetection:

cd datasets
python augmentation_box.py
python gen2coco.py

(2) We generate auxiliary training images based on the newly augmented annotations (metadata_gen.jsonl from augmentation_box.py) and move these synthetic images to the real training image folder::

cd ..
python inference.py

(3) For a comprehensive trainability evaluation, we utilize the pre-trained Faster R-CNN (R50) in mmdetection2.25.3 (note to modify the training image and annotation path in coco_detection.py):

cd mmdetection
bash tools/dist_train.sh configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py 8 --work-dir work_dirs/faster_rcnn_r50_fpn_1x_dior_mix

Note that each synthesized training image should either match the resolution of its corresponding real image or, if different, correspond correctly to the augmented annotations. For other datasets such as ExDARK and RUOD, the trainability evaluation methods are similar. Pay attention to the correctness of the synthetic image and annotation paths.

Problems/Cooperation

  • If you are interested in FICGen and wish to cooperate with us, feel free to contact us. Reach out: wz_wang@buaa.edu.cn

Acknowledgements

Our work is based on stable diffusion, diffusers, CLIP, CC-Diff, MIGC, We appreciate their excellent contributions for Layout-to-Image Generation.

Citation

@inproceedings{wang2025ficgen,
  title={FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation},
  author={Wang, Wenzhuang and Zhao, Yifan and Ma, Mingcan and Liu, Ming and Jiang, Zhonglin and Chen, Yong and Li, Jia},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={19097--19107},
  year={2025}
}

About

Implementation of " FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation"(ICCV2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published