Harold Haodong Chen1,2*, Xinxiang Yin1,3*,
Wen-Jie Shu2, Hongfei Zhang1, Zixin Zhang1, Chenfei Liao1, Litao Guo1,
Qifeng Chen2†, Ying-Cong Chen1,2†
*Equal Contribution; †Corresponding Author
1HKUST(GZ), 2HKUST, 3NWPU
cd LatentMorphThis repo ships environment.yml.
conda env create -f environment.yml
conda activate ./envs/latentmorphIf you don't use conda, make sure you can run:
python -c "import torch; import transformers; print(torch.__version__)"This repo does not ship training datasets under data/. Please download them locally via Hugging Face.
mkdir -p data/.cache/huggingface data/.cache/torch data/hps_ckpt outputs_sft/checkpoints_control outputs/rl_resultWe store Hugging Face cache inside the repo:
export HF_HOME="$(pwd)/data/.cache/huggingface"
export TORCH_HOME="$(pwd)/data/.cache/torch"
python -m pip install huggingface_hubDownload Janus and CLIP:
python -m huggingface_hub.cli download deepseek-ai/Janus-Pro-7B --local-dir "$HF_HOME"
python -m huggingface_hub.cli download openai/clip-vit-large-patch14 --local-dir "$HF_HOME"Download HPS v2.1 reward weights:
bash scripts/download_required_assets.sh
python -m pip install "git+https://github.com/tgxs002/HPSv2.git"We expect the following local layout:
- SFT dataset:
data/midjourney-prompts/data/*.zstd.parquet - RL prompts:
data/T2I-CompBench/examples/dataset/*.txt
Download with Hugging Face (replace the repo ids):
# Midjourney prompts (parquet shards) -> data/midjourney-prompts/data/*.zstd.parquet
huggingface-cli download --repo-type dataset vivym/midjourney-prompts \
--local-dir data/midjourney-prompts --resume-download
# T2I-CompBench prompts (.txt) -> data/T2I-CompBench/examples/dataset/*.txt
huggingface-cli download --repo-type dataset NinaKarine/t2i-compbench \
--include "examples/dataset/*.txt" \
--local-dir data/T2I-CompBench --resume-downloadQuick sanity checks:
ls -lh data/midjourney-prompts/data | head
ls -lh data/T2I-CompBench/examples/dataset | headLatentMorph has two Inference part provided :
-
SFT Inference Part (
inference_sft) -
RL Inference Part (
inference_rl)
Before running inference, ensure you have activated the environment:
conda activate latentmorphYou can download our pre-trained checkpoints from Hugging Face:
| Weight Type | Filename | Download Command |
|---|---|---|
| SFT Controller | ckpt_sft.pt |
huggingface-cli download CheeseStar/LatenttMorph ckpt_sft.pt --repo-type dataset --local-dir . |
| RL Policy | ckpt_rl.pt |
(Coming soon) |
| SFT Controller w/ LoRA | ckpt_sft_LoRA.pt |
(User Trained) |
| RL Policy w/ LoRA | ckpt_rl_LoRA.pt |
(User Trained) |
We provide two modes for both SFT and RL stages. Choose the corresponding script folder (inference_sft or inference_rl).
Generate an image from a specific text prompt.
# Example for SFT
bash inference_sft/run_infer_one.bashCustomization: Open
run_infer_one.bashto modify thepromptstring andoutputpath. Result: View your image atinference_[sft/rl]_out/single.png.
Generate multiple images using a .txt file (one prompt per line).
# Example for RL
bash inference_rl/run_infer.bashSetup: Ensure your
prompts_filepath in the bash script points to your text file. Result: All generated images will be saved ininference_[sft/rl]_out/batch/.
LatentMorph has two training stages:
- SFT (
latent_sft): train lightweight control modules (controller) with teacher-forcing while freezing the large Janus model. - RL (
latent_rl): train a trigger policy + condenser with CLIP/HPS rewards (the rest of Janus/control stack stays frozen).
bash sft_train.shYou can control the training depth using the
--lora_controlflag in the training script:
--lora_control 0: Trains only the control modules (Backbone remains frozen).--lora_control 1: Fine-tunes the Backbone and control modules together via LoRA.
Outputs:
outputs_sft/checkpoints_control/ckpt_latest.ptoutputs_sft/checkpoints_control/ckpt_step_*.pt
Ensure your SFT checkpoint exists at outputs_sft/checkpoints_control/ckpt_latest.pt.
bash rl_train.shOutputs:
outputs/rl_result/ckpt_latest.ptoutputs/rl_result/ckpt_step_*.ptoutputs/rl_result/logs/
Please consider citing our paper if you find LatentMorph is useful:
@article{chen2026show,
title={Show, Don't Tell: Morphing Latent Reasoning into Image Generation},
author={Chen, Harold Haodong and Yin, Xinxiang and Shu, Wen-Jie and Zhang, Hongfei and Zhang, Zixin and Liao, Chenfei and Guo, Litao and Chen, Qifeng and Chen, Ying-Cong},
journal={arXiv preprint arXiv:2602.02227},
year={2026}
}Our LatentMorph is developed based on the codebases of Janus-Pro, Janus-Pro-R1 and DanceGRPO, and we would like to thank the developers of them.
For any question, feel free to open a issue or email haroldchen328@gmail.com.
