Skip to content

GeoDS/GeoAIBench-TopoDirection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GeoAIBench-TopoDirection

Evaluating Geospatial Reasoning Capabilities in Large Language Models: A Benchmark on Geometry Classification, Topological Relations and Direction Estimation

Abstract

This study investigates the geospatial reasoning abilities of large language models (LLMs) by evaluating their performance on three spatial tasks: geometry type classification, topological relation prediction based on the Dimensionally Extended 9-Intersection Model (DE-9IM) and directional inference using centroid-based azimuth. Specifically, this research examines four state-of-the-art LLMs: DeepSeek-V3-0324, Claude-Sonnet-4-20250514, Gemini-2.0-Flash and Llama-4-Maverick-17B. Using a benchmark of 4,100 subject-object geometry pairs encoded in GeoJSON format, we assess each model’s ability to identify geometry types, infer topological spatial relationships and estimate direction. The evaluation reveals significant variation in performance across LLMs and spatial configurations, offering insights into the spatial reasoning capabilities of foundation models and their potential applications in GeoAI and spatial intelligence.

Workflow

Workflow

Reference

If you find our code or ideas useful for your research, please cite our paper:

Yu Chin Huang, Yuhan Ji, and Song Gao. 2025. Evaluating Geospatial Reasoning
Capabilities in Large Language Models: A Benchmark on Geometry
Classification, Topological Relations and Direction Estimation. In The 4th
ACM SIGSPATIAL International Workshop on Spatial Big Data and AI for
Industrial Applications (GeoIndustry ’25), November 3–6, 2025, Minneapolis,
MN, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/
3764919.3770881

Code Usage

1. Test Dataset Generator

Generate a labeled synthetic test dataset of geometry pairs for spatial reasoning tasks.

2. Geospatial Reasoning Test

Automate prediction of geometry type, topological relations and directions for geometry pairs using large language models.

3. Performance Evaluation (Cross-Model)

Evaluate model performance on spatial understanding tasks across Claude, DeepSeek, Gemini, Llama.

4. Performance Evaluation & Outcome Visualization

Compute metrics and produce diagnostic visualizations for relation, type, and direction tasks.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •