GitHub - EnVision-Research/LatentMorph: LatentMorph: Morphing Latent Reasoning into Image Generation

Show, Don’t Tell: Morphing Latent Reasoning into Image Generation

Harold Haodong Chen^1,2*, Xinxiang Yin^1,3*,
Wen-Jie Shu², Hongfei Zhang¹, Zixin Zhang¹, Chenfei Liao¹, Litao Guo¹,
Qifeng Chen^2†, Ying-Cong Chen^1,2†

^*Equal Contribution; ^†Corresponding Author
¹HKUST(GZ), ²HKUST, ³NWPU

If you like our project, please give us a star ⭐ on GitHub for latest update.

🚀 Installation

1. Clone this repository and navigate to source folder

cd LatentMorph

2. Build Environment

This repo ships environment.yml.

conda env create -f environment.yml
conda activate ./envs/latentmorph

If you don't use conda, make sure you can run:

python -c "import torch; import transformers; print(torch.__version__)"

🌏 Data & Model

This repo does not ship training datasets under data/. Please download them locally via Hugging Face.

1. Create the local data layout

mkdir -p data/.cache/huggingface data/.cache/torch data/hps_ckpt outputs_sft/checkpoints_control outputs/rl_result

2. Download model weights into the local cache

We store Hugging Face cache inside the repo:

export HF_HOME="$(pwd)/data/.cache/huggingface"
export TORCH_HOME="$(pwd)/data/.cache/torch"
python -m pip install huggingface_hub

Download Janus and CLIP:

python -m huggingface_hub.cli download deepseek-ai/Janus-Pro-7B --local-dir "$HF_HOME"
python -m huggingface_hub.cli download openai/clip-vit-large-patch14 --local-dir "$HF_HOME"

Download HPS v2.1 reward weights:

bash scripts/download_required_assets.sh
python -m pip install "git+https://github.com/tgxs002/HPSv2.git"

3. Datasets / prompts (download from Hugging Face)

We expect the following local layout:

SFT dataset: data/midjourney-prompts/data/*.zstd.parquet
RL prompts: data/T2I-CompBench/examples/dataset/*.txt

Download with Hugging Face (replace the repo ids):

# Midjourney prompts (parquet shards) -> data/midjourney-prompts/data/*.zstd.parquet
huggingface-cli download --repo-type dataset vivym/midjourney-prompts \
  --local-dir data/midjourney-prompts --resume-download

# T2I-CompBench prompts (.txt) -> data/T2I-CompBench/examples/dataset/*.txt
huggingface-cli download --repo-type dataset NinaKarine/t2i-compbench \
  --include "examples/dataset/*.txt" \
  --local-dir data/T2I-CompBench --resume-download

Quick sanity checks:

ls -lh data/midjourney-prompts/data | head
ls -lh data/T2I-CompBench/examples/dataset | head

📍 Inference Suite

LatentMorph has two Inference part provided :

SFT Inference Part (inference_sft)
RL Inference Part (inference_rl)

Before running inference, ensure you have activated the environment:

conda activate latentmorph

1. Prepare Model Weights

You can download our pre-trained checkpoints from Hugging Face:

Weight Type	Filename	Download Command
SFT Controller	`ckpt_sft.pt`	`huggingface-cli download CheeseStar/LatenttMorph ckpt_sft.pt --repo-type dataset --local-dir .`
RL Policy	`ckpt_rl.pt`	(Coming soon)
SFT Controller w/ LoRA	`ckpt_sft_LoRA.pt`	(User Trained)
RL Policy w/ LoRA	`ckpt_rl_LoRA.pt`	(User Trained)

2. Run Inference

We provide two modes for both SFT and RL stages. Choose the corresponding script folder (inference_sft or inference_rl).

Option A: Single Prompt (Quick Test)

Generate an image from a specific text prompt.

# Example for SFT
bash inference_sft/run_infer_one.bash

Customization: Open run_infer_one.bash to modify the prompt string and output path. Result: View your image at inference_[sft/rl]_out/single.png.

Option B: Batch Processing (Group of Prompts)

Generate multiple images using a .txt file (one prompt per line).

# Example for RL
bash inference_rl/run_infer.bash

Setup: Ensure your prompts_file path in the bash script points to your text file. Result: All generated images will be saved in inference_[sft/rl]_out/batch/.

▶️ Training Suite

LatentMorph has two training stages:

SFT (latent_sft): train lightweight control modules (controller) with teacher-forcing while freezing the large Janus model.
RL (latent_rl): train a trigger policy + condenser with CLIP/HPS rewards (the rest of Janus/control stack stays frozen).

SFT: train controller (teacher-forcing)

bash sft_train.sh

You can control the training depth using the --lora_control flag in the training script:

--lora_control 0: Trains only the control modules (Backbone remains frozen).

--lora_control 1: Fine-tunes the Backbone and control modules together via LoRA.

Outputs:

outputs_sft/checkpoints_control/ckpt_latest.pt
outputs_sft/checkpoints_control/ckpt_step_*.pt

RL: train trigger policy (policy gradient)

Ensure your SFT checkpoint exists at outputs_sft/checkpoints_control/ckpt_latest.pt.

bash rl_train.sh

Outputs:

outputs/rl_result/ckpt_latest.pt
outputs/rl_result/ckpt_step_*.pt
outputs/rl_result/logs/

📝 Citation

Please consider citing our paper if you find LatentMorph is useful:

@article{chen2026show,
  title={Show, Don't Tell: Morphing Latent Reasoning into Image Generation},
  author={Chen, Harold Haodong and Yin, Xinxiang and Shu, Wen-Jie and Zhang, Hongfei and Zhang, Zixin and Liao, Chenfei and Guo, Litao and Chen, Qifeng and Chen, Ying-Cong},
  journal={arXiv preprint arXiv:2602.02227},
  year={2026}
}

🍗 Acknowledgement

Our LatentMorph is developed based on the codebases of Janus-Pro, Janus-Pro-R1 and DanceGRPO, and we would like to thank the developers of them.

📪 Contact

For any question, feel free to open a issue or email haroldchen328@gmail.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Show, Don’t Tell: Morphing Latent Reasoning into Image Generation

If you like our project, please give us a star ⭐ on GitHub for latest update.

🚀 Installation

1. Clone this repository and navigate to source folder

2. Build Environment

🌏 Data & Model

1. Create the local data layout

2. Download model weights into the local cache

3. Datasets / prompts (download from Hugging Face)

📍 Inference Suite

1. Prepare Model Weights

2. Run Inference

Option A: Single Prompt (Quick Test)

Option B: Batch Processing (Group of Prompts)

▶️ Training Suite

SFT: train controller (teacher-forcing)

RL: train trigger policy (policy gradient)

📝 Citation

🍗 Acknowledgement

📪 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Janus-Pro		Janus-Pro
assets		assets
inference_rl		inference_rl
inference_sft		inference_sft
latent_control		latent_control
latent_rl		latent_rl
latent_sft		latent_sft
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
rl_train.sh		rl_train.sh
sft_train.sh		sft_train.sh
ulm_lora_control.py		ulm_lora_control.py

EnVision-Research/LatentMorph

Folders and files

Latest commit

History

Repository files navigation

Show, Don’t Tell: Morphing Latent Reasoning into Image Generation

If you like our project, please give us a star ⭐ on GitHub for latest update.

🚀 Installation

1. Clone this repository and navigate to source folder

2. Build Environment

🌏 Data & Model

1. Create the local data layout

2. Download model weights into the local cache

3. Datasets / prompts (download from Hugging Face)

📍 Inference Suite

1. Prepare Model Weights

2. Run Inference

Option A: Single Prompt (Quick Test)

Option B: Batch Processing (Group of Prompts)

▶️ Training Suite

SFT: train controller (teacher-forcing)

RL: train trigger policy (policy gradient)

📝 Citation

🍗 Acknowledgement

📪 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages