VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal Reinforcement Learning

Installation

Train Environment

This environment is utilized for training our models and evaluating MathVista and MathVerse benchmarks.

cd requirements
python -m venv vade_train
source vade_train/bin/activate
pip install -r train_requirements.txt
cd ../train
pip install --no-deps -e .

lmms-eval environment for evaluation

This environment is utilized for evaluating MathVision, ChartQA and ScienceQA benchmarks.

cd requirements
python -m venv lmms-eval
source lmms-eval/bin/activate
pip install -r lmms_eval_requirements.txt
cd ../eval/lmms-eval
pip install -e .

Train

cd train
bash recipe/vade/scripts/7b_grpo.sh
## or bash recipe/vade/scripts/7b_gspo.sh
## or bash recipe/vade/scripts/3b_grpo.sh
## or bash recipe/vade/scripts/3b_gspo.sh

Evaluation

All test scripts are located in eval/scripts/.

Start LLM-as-a-judge Model (Optional)

In our experiments, we use Qwen2.5-72B-Instruct to serve as the LLM-as-a-Judge. Alternatively, you can use other models as well.

cd eval/scripts
bash vllm_72b.sh

Start Evaluation

For MathVision, ChartQA and ScienceQA, you can directly run the following scripts:

cd mathvison / chartqa / scienceqa
bash mathvision.sh / chartqa.sh / scienceqa.sh

For MathVista and MathVerse, you can run the following scripts:

cd mathvista
bash mathvista_inferece.sh
bash mathvista_eval.sh

## or for MathVerse
cd mathverse
bash mathverse_inferece.sh
bash mathverse_eval.sh

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
eval		eval
requirements		requirements
train		train
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal Reinforcement Learning

Installation

Train Environment

lmms-eval environment for evaluation

Train

Evaluation

Start LLM-as-a-judge Model (Optional)

Start Evaluation

About

Uh oh!

Releases

Packages

Languages

FloSophoraeX/VADE

Folders and files

Latest commit

History

Repository files navigation

VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal Reinforcement Learning

Installation

Train Environment

lmms-eval environment for evaluation

Train

Evaluation

Start LLM-as-a-judge Model (Optional)

Start Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages