Related papers: MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

URL: http://arxiv.org/abs/2501.00316v2
Date: Fri, 06 Jun 2025 08:14:05 GMT
Title: MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Authors: Mahir Labib Dihan, Md Tanvir Hassan, Md Tanvir Parvez, Md Hasebul Hasan, Md Almash Alam, Muhammad Aamir Cheema, Mohammed Eunus Ali, Md Rizwan Parvez,
Abstract summary: MapEval is a benchmark designed to assess foundation models across three distinct tasks.<n>It covers spatial relationships, navigation, travel planning, and real-world map interactions.<n>It requires models to handle long-context reasoning, API interactions, and visual map analysis.
Score: 7.422346909538787
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in foundation models have improved autonomous tool usage and reasoning, but their capabilities in map-based reasoning remain underexplored. To address this, we introduce MapEval, a benchmark designed to assess foundation models across three distinct tasks - textual, API-based, and visual reasoning - through 700 multiple-choice questions spanning 180 cities and 54 countries, covering spatial relationships, navigation, travel planning, and real-world map interactions. Unlike prior benchmarks that focus on simple location queries, MapEval requires models to handle long-context reasoning, API interactions, and visual map analysis, making it the most comprehensive evaluation framework for geospatial AI. On evaluation of 30 foundation models, including Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro, none surpass 67% accuracy, with open-source models performing significantly worse and all models lagging over 20% behind human performance. These results expose critical gaps in spatial inference, as models struggle with distances, directions, route planning, and place-specific reasoning, highlighting the need for better geospatial AI to bridge the gap between foundation models and real-world navigation. All the resources are available at: https://mapeval.github.io/.

Related papers

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology [87.65242416688146]
TreeBench is a diagnostic benchmark for visual grounded reasoning.<n>TreeVGR is a training paradigm to supervise localization and reasoning jointly with reinforcement learning.
arXiv Detail & Related papers (2025-07-10T17:59:58Z)
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation. We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities. experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z)
GeoJEPA: Towards Eliminating Augmentation- and Sampling Bias in Multimodal Geospatial Learning [0.0]
We present GeoJEPA, a versatile multimodal fusion model for geospatial data built on the self-supervised Joint-Embedding Predictive Architecture. We aim to eliminate the widely accepted augmentation- and sampling biases found in self-supervised geospatial representation learning. The results are multimodal semantic representations of urban regions and map entities that we evaluate both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-02-25T22:03:28Z)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components. GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric. We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z)
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs [64.58959634712215]
Geologic map, as a fundamental diagram in geology science, provides critical insights into the structure and composition of Earth's subsurface and surface. Despite their significance, current Multimodal Large Language Models (MLLMs) often fall short in geologic map understanding. To quantify this gap, we construct GeoMap-Bench, the first-ever benchmark for evaluating MLLMs in geologic map understanding.
arXiv Detail & Related papers (2025-01-10T18:59:42Z)
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.<n>Our benchmark features over 10,000 manually verified instructions and covers a diverse set of variations in visual conditions, object type, and scale.<n>We evaluate several state-of-the-art VLMs to assess their accuracy within the geospatial context.
arXiv Detail & Related papers (2024-11-28T18:59:56Z)
TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior [70.84644266024571]
We propose to train a perception model to "see" standard definition maps (SDMaps) We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information. Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology.
arXiv Detail & Related papers (2024-11-22T06:13:42Z)
MAPWise: Evaluating Vision-Language Models for Advanced Map Queries [47.15503716894445]
This study investigates the efficacy of vision-language models (VLMs) in answering questions based on maps. We introduce a novel map-based question-answering benchmark, consisting of maps from three geographical regions (United States, India, China) Our benchmark incorporates 43 diverse question templates, requiring nuanced understanding of relative spatial relationships, intricate map features, and complex reasoning.
arXiv Detail & Related papers (2024-08-30T20:57:34Z)
Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model's Generalizability in Permafrost Mapping [19.307294875969827]
This paper introduces AI foundation models and their defining characteristics. We evaluate the performance of large AI vision models, especially Meta's Segment Anything Model (SAM) The results show that although promising, SAM still has room for improvement to support AI-augmented terrain mapping.
arXiv Detail & Related papers (2024-01-16T19:10:09Z)
MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation [73.81268591484198]
Embodied agents equipped with GPT have exhibited extraordinary decision-making and generalization abilities across various tasks. We present a novel map-guided GPT-based agent, dubbed MapGPT, which introduces an online linguistic-formed map to encourage global exploration. Benefiting from this design, we propose an adaptive planning mechanism to assist the agent in performing multi-step path planning based on a map.
arXiv Detail & Related papers (2024-01-14T15:34:48Z)
Distortions in Judged Spatial Relations in Large Language Models [45.875801135769585]
GPT-4 exhibited superior performance with 55 percent accuracy, followed by GPT-3.5 at 47 percent, and Llama-2 at 45 percent. The models identified the nearest cardinal direction in most cases, reflecting their associative learning mechanism.
arXiv Detail & Related papers (2024-01-08T20:08:04Z)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z)
Assessment of a new GeoAI foundation model for flood inundation mapping [4.312965283062856]
This paper evaluates the performance of the first-of-its-kind geospatial foundation model, IBM-NASA's Prithvi, to support a crucial geospatial analysis task: flood inundation mapping. A benchmark dataset, Sen1Floods11, is used in the experiments, and the models' predictability, generalizability, and transferability are evaluated. Results show the good transferability of the Prithvi model, highlighting its performance advantages in segmenting flooded areas in previously unseen regions.
arXiv Detail & Related papers (2023-09-25T19:50:47Z)
Neural Map Prior for Autonomous Driving [17.198729798817094]
High-definition (HD) semantic maps are crucial in enabling autonomous vehicles to navigate urban environments. Traditional method of creating offline HD maps involves labor-intensive manual annotation processes. Recent studies have proposed an alternative approach that generates local maps using online sensor observations. In this study, we propose Neural Map Prior (NMP), a neural representation of global maps.
arXiv Detail & Related papers (2023-04-17T17:58:40Z)
A General Purpose Neural Architecture for Geospatial Systems [142.43454584836812]
We present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias. We envision how such a model may facilitate cooperation between members of the community.
arXiv Detail & Related papers (2022-11-04T09:58:57Z)
OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping [15.419052489797775]
OpenEarthMap is a benchmark dataset for global high-resolution land cover mapping. It consists of 2.2 million segments of 5000 aerial and satellite images covering 97 regions from 44 countries across 6 continents.
arXiv Detail & Related papers (2022-10-19T17:20:16Z)
ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability. We propose a learning-based approach that integrates learning and planning. ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z)
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets. We leverage a largely ignored source of information: the behavior of the model on individual instances during training. Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.