Skip to content

FloSophoraeX/VADE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal Reinforcement Learning

Installation

Train Environment

This environment is utilized for training our models and evaluating MathVista and MathVerse benchmarks.

cd requirements
python -m venv vade_train
source vade_train/bin/activate
pip install -r train_requirements.txt
cd ../train
pip install --no-deps -e .

lmms-eval environment for evaluation

This environment is utilized for evaluating MathVision, ChartQA and ScienceQA benchmarks.

cd requirements
python -m venv lmms-eval
source lmms-eval/bin/activate
pip install -r lmms_eval_requirements.txt
cd ../eval/lmms-eval
pip install -e .

Train

cd train
bash recipe/vade/scripts/7b_grpo.sh
## or bash recipe/vade/scripts/7b_gspo.sh
## or bash recipe/vade/scripts/3b_grpo.sh
## or bash recipe/vade/scripts/3b_gspo.sh

Evaluation

All test scripts are located in eval/scripts/.

Start LLM-as-a-judge Model (Optional)

In our experiments, we use Qwen2.5-72B-Instruct to serve as the LLM-as-a-Judge. Alternatively, you can use other models as well.

cd eval/scripts
bash vllm_72b.sh

Start Evaluation

For MathVision, ChartQA and ScienceQA, you can directly run the following scripts:

cd mathvison / chartqa / scienceqa
bash mathvision.sh / chartqa.sh / scienceqa.sh

For MathVista and MathVerse, you can run the following scripts:

cd mathvista
bash mathvista_inferece.sh
bash mathvista_eval.sh

## or for MathVerse
cd mathverse
bash mathverse_inferece.sh
bash mathverse_eval.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published