GeoAIBench-TopoDirection

Evaluating Geospatial Reasoning Capabilities in Large Language Models: A Benchmark on Geometry Classification, Topological Relations and Direction Estimation

Abstract

This study investigates the geospatial reasoning abilities of large language models (LLMs) by evaluating their performance on three spatial tasks: geometry type classification, topological relation prediction based on the Dimensionally Extended 9-Intersection Model (DE-9IM) and directional inference using centroid-based azimuth. Specifically, this research examines four state-of-the-art LLMs: DeepSeek-V3-0324, Claude-Sonnet-4-20250514, Gemini-2.0-Flash and Llama-4-Maverick-17B. Using a benchmark of 4,100 subject-object geometry pairs encoded in GeoJSON format, we assess each model’s ability to identify geometry types, infer topological spatial relationships and estimate direction. The evaluation reveals significant variation in performance across LLMs and spatial configurations, offering insights into the spatial reasoning capabilities of foundation models and their potential applications in GeoAI and spatial intelligence.

Workflow

Reference

If you find our code or ideas useful for your research, please cite our paper:

Yu Chin Huang, Yuhan Ji, and Song Gao. 2025. Evaluating Geospatial Reasoning
Capabilities in Large Language Models: A Benchmark on Geometry
Classification, Topological Relations and Direction Estimation. In The 4th
ACM SIGSPATIAL International Workshop on Spatial Big Data and AI for
Industrial Applications (GeoIndustry ’25), November 3–6, 2025, Minneapolis,
MN, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/
3764919.3770881

Code Usage

1. Test Dataset Generator

Generate a labeled synthetic test dataset of geometry pairs for spatial reasoning tasks.

2. Geospatial Reasoning Test

Automate prediction of geometry type, topological relations and directions for geometry pairs using large language models.

3. Performance Evaluation (Cross-Model)

Evaluate model performance on spatial understanding tasks across Claude, DeepSeek, Gemini, Llama.

4. Performance Evaluation & Outcome Visualization

Compute metrics and produce diagnostic visualizations for relation, type, and direction tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
data		data
GeoAIBench-TopoDirection-Workflow.png		GeoAIBench-TopoDirection-Workflow.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoAIBench-TopoDirection

Evaluating Geospatial Reasoning Capabilities in Large Language Models: A Benchmark on Geometry Classification, Topological Relations and Direction Estimation

Abstract

Workflow

Reference

Code Usage

1. Test Dataset Generator

2. Geospatial Reasoning Test

3. Performance Evaluation (Cross-Model)

4. Performance Evaluation & Outcome Visualization

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

GeoDS/GeoAIBench-TopoDirection

Folders and files

Latest commit

History

Repository files navigation

GeoAIBench-TopoDirection

Evaluating Geospatial Reasoning Capabilities in Large Language Models: A Benchmark on Geometry Classification, Topological Relations and Direction Estimation

Abstract

Workflow

Reference

Code Usage

1. Test Dataset Generator

2. Geospatial Reasoning Test

3. Performance Evaluation (Cross-Model)

4. Performance Evaluation & Outcome Visualization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages