VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal Reinforcement Learning
This environment is utilized for training our models and evaluating MathVista and MathVerse benchmarks.
cd requirements
python -m venv vade_train
source vade_train/bin/activate
pip install -r train_requirements.txt
cd ../train
pip install --no-deps -e .This environment is utilized for evaluating MathVision, ChartQA and ScienceQA benchmarks.
cd requirements
python -m venv lmms-eval
source lmms-eval/bin/activate
pip install -r lmms_eval_requirements.txt
cd ../eval/lmms-eval
pip install -e .cd train
bash recipe/vade/scripts/7b_grpo.sh
## or bash recipe/vade/scripts/7b_gspo.sh
## or bash recipe/vade/scripts/3b_grpo.sh
## or bash recipe/vade/scripts/3b_gspo.shAll test scripts are located in eval/scripts/.
In our experiments, we use Qwen2.5-72B-Instruct to serve as the LLM-as-a-Judge. Alternatively, you can use other models as well.
cd eval/scripts
bash vllm_72b.shFor MathVision, ChartQA and ScienceQA, you can directly run the following scripts:
cd mathvison / chartqa / scienceqa
bash mathvision.sh / chartqa.sh / scienceqa.shFor MathVista and MathVerse, you can run the following scripts:
cd mathvista
bash mathvista_inferece.sh
bash mathvista_eval.sh
## or for MathVerse
cd mathverse
bash mathverse_inferece.sh
bash mathverse_eval.sh