Skip to content

johnson111788/SpatialReasoner

Repository files navigation

SpatialReasoner

Official implementation of SpatialReasoner, from the following paper

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning.
Wufei Ma*, Yu-Cheng Chou*, Qihao Liu*, Xingrui Wang, Celso de Melo†, Jieneng Chen, Jianwen Xie^, and Alan Yuille
Johns Hopkins University, †DEVCOM Army Research Laboratory, ^Lambda Inc
[arXiv] [Project Page]


Comparing 3D spatial reasoning of our SpatialReasoner with previous state-of-the-art models. Our SpatialReasoner builds on explicit 3D representations, performs 3D computation, and reasons about the final answer. Although Gemini 2.0 can also break down complex 3D spatial reasoning questions into small and tractable steps, it lacks reliable 3D computation that leads to the correct answer.

Installation

Setup Python dependencies.

conda create -n spatial_reasoner python=3.11 -y && conda activate spatial_reasoner
pip3 install -e ".[dev]"
pip3 install flash-attn --no-build-isolation
pip3 install qwen_vl_utils xlsxwriter

Setup evaluation environment.

git submodule update --init --recursive
cd VLMEvalKit
pip install -e .

Training

Download Training Data

mkdir ./data && cd ./data

# OpenImages
wget https://huggingface.co/datasets/ccvl/SpatialReasonerTrain/resolve/main/openimages.tar
tar -xvf openimages.tar

# LLaVA
wget https://huggingface.co/datasets/ccvl/SpatialReasonerTrain/resolve/main/llava.tar
tar -xvf llava.tar

cd ../

Training

  • SpatialReasoner-SFT
bash local_scripts/spatialreasoner-sft.sh
  • SpatialReasoner-Zero
bash local_scripts/spatialreasoner-zero.sh
  • SpatialReasoner
bash local_scripts/spatialreasoner.sh

Evaluation


Comparison with previous state-of-the-art methods on 3DSRBench. Our SpatialReasoner outperforms previous open-source and proprietary methods on challenging 3D spatial reasoning problems in 3DSRBench.

Download Evaluation Data

cd ./data

# 3DSRBench
wget https://huggingface.co/datasets/ccvl/3DSRBench/resolve/main/3dsrbench_v1_vlmevalkit_circular.tsv

# CV-Bench-3D
wget https://huggingface.co/datasets/ccvl/SpatialReasonerEval/resolve/main/CV-Bench-3D.tsv

Inference

  • SpatialReasoner-SFT
bash local_scripts/infer_spatialreasoner-sft.sh
  • SpatialReasoner-Zero
bash local_scripts/infer_spatialreasoner-zero.sh
  • SpatialReasoner
bash local_scripts/infer_spatialreasoner.sh

Results for CVBench3D are printed to the terminal (stdout), and the final results for 3DSRBench are saved to results_3DSRBench.csv.

Citation

If you find this repository helpful, please consider citing:

@article{ma2025spatialreasoner,
  title={SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning},
  author={Ma, Wufei and Chou, Yu-Cheng and Liu, Qihao and Wang, Xingrui and de Melo, Celso and Xie, Jianwen and Yuille, Alan},
  journal={arXiv preprint arXiv:2504.20024},
  year={2025}
}

About

Training recipe for SpatialReasoner

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •