Setup the repository with the following commands:
git clone https://github.com/xjiaf/GIIFT.git
cd GIIFT
conda env create --file environment.yml
conda activate giiftAll data should be organised in the data/ directory.
Execute data/multi30k/setup_multi30k.sh to download the text data and organize the folders from kaggle and google drive.
It will also download the images for Flickr30k and upzip it in the data/multi30k folder.
This repository contains a Bash launcher for GIIFT. Simply run:
bash run.shThe default configuration uses:
- Dataset:
multi30k - Backbone:
mbart - Task:
en->de,fr
This executes Stage 1 → Stage 2 → Testing for each target language.
Below are the actual commands generated internally by the script (with default values). This helps users understand what the launcher builds.
Example for German (de) and backbone mbart:
python src/main.py --num_gpus 1 \
--mn multi30k_mbart \
--prefix_length 1 \
--bs 64 \
--update_count 4 \
--lr 2e-5 \
--epochs 50 \
--test_ds "2016 val" \
--stage caption \
--tgt_lang de \
--num_heads 8 \
--num_layers 9 \
--mapping_network gatl \
--mask_prob 0 \
--backbone mbart \
--use_gate --use_mbart_encoder --use_fusion \
--gpu_id 0python src/main.py --num_gpus 1 \
--mn multi30k_mbart \
--prefix_length 1 \
--bs 64 \
--update_count 4 \
--lr 1e-5 \
--epochs 50 \
--test_ds "2016 val" \
--stage translate \
--tgt_lang de \
--lm model_pretrained.pth \
--num_heads 8 \
--num_layers 9 \
--mapping_network gatl \
--backbone mbart \
--use_gate --use_mbart_encoder --use_fusion \
--gpu_id 0Runs on:
- 2016:flickr
- 2017:flickr
- 2017:mscoco
python src/main.py --num_gpus 1 \
--mn multi30k_mbart \
--src_lang en \
--tgt_lang de \
--prefix_length 1 \
--bs 64 \
--test_ds 2016 flickr \
--stage translate \
--test \
--lm model_best_test.pth \
--num_heads 8 \
--num_layers 9 \
--mapping_network gatl \
--backbone mbart \
--use_gate --use_mbart_encoder --use_fusion \
--gpu_id 0In the script:
dataset="multi30k"Switch to WMT:
dataset="wmt"Modify:
languages=("de" "fr")Examples:
Only German
languages=("de")German + Czech + Ukrainian
languages=("de" "cs" "uk")Important:
These values directly become --tgt_lang in both stages.
run_stage1=true
run_stage2=falserun_stage1=false
run_stage2=truedataset="wmt"
languages=("de")
backbones=("mbart")If the code and/or method was useful to your work, please consider citing us!
@inproceedings{xiong-zhao-2025-giift,
title = "{GIIFT}: Graph-guided Inductive Image-free Multimodal Machine Translation",
author = "Xiong, Jiafeng and
Zhao, Yuting",
booktitle = "Proceedings of the Tenth Conference on Machine Translation",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
doi = "10.18653/v1/2025.wmt-1.6",
pages = "98--112",
ISBN = "979-8-89176-341-8"
}