Related papers: MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars

MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars

URL: http://arxiv.org/abs/2602.13961v1
Date: Sun, 15 Feb 2026 02:41:56 GMT
Title: MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars
Authors: Shuoyuan Wang, Yiran Wang, Hongxin Wei,
Abstract summary: We introduce MarsRetrieval, a retrieval benchmark for evaluating vision-language models for Martian geospatial discovery.<n>We propose a unified retrieval-centric protocol to benchmark multimodal embedding architectures.<n>Our evaluation shows MarsRetrieval is challenging, even strong foundation models often fail to capture domain-specific geomorphic distinctions.
Score: 21.01507072531742
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Data-driven approaches like deep learning are rapidly advancing planetary science, particularly in Mars exploration. Despite recent progress, most existing benchmarks remain confined to closed-set supervised visual tasks and do not support text-guided retrieval for geospatial discovery. We introduce MarsRetrieval, a retrieval benchmark for evaluating vision-language models for Martian geospatial discovery. MarsRetrieval includes three tasks: (1) paired image-text retrieval, (2) landform retrieval, and (3) global geo-localization, covering multiple spatial scales and diverse geomorphic origins. We propose a unified retrieval-centric protocol to benchmark multimodal embedding architectures, including contrastive dual-tower encoders and generative vision-language models. Our evaluation shows MarsRetrieval is challenging: even strong foundation models often fail to capture domain-specific geomorphic distinctions. We further show that domain-specific fine-tuning is critical for generalizable geospatial discovery in planetary settings. Our code is available at https://github.com/ml-stat-Sustech/MarsRetrieval

Related papers

Natural Language-Driven Global Mapping of Martian Landforms [25.54158424879149]
MarScope is a vision-language framework enabling natural language-driven, label-free mapping of Martian landforms.<n>It aligns planetary images and text in a shared semantic space, trained on over 200,000 curated image-text pairs.<n>This framework transforms global geomorphic mapping on Mars by replacing pre-defined classifications with flexible semantic retrieval.
arXiv Detail & Related papers (2026-01-22T13:38:13Z)
Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks [7.399515278460871]
A key enabler of progress in other domains has been the availability of standardized benchmarks that support systematic evaluation.<n>We introduce Mars-Bench, the first benchmark designed to systematically evaluate models across a broad range of Mars-related tasks.<n>We provide standardized, ready-to-use datasets and baseline evaluations using models pre-trained on natural images, Earth satellite data, and state-of-the-art vision-language models.
arXiv Detail & Related papers (2025-10-28T02:34:08Z)
Inpainting the Red Planet: Diffusion Models for the Reconstruction of Martian Environments in Virtual Reality [0.0]
Training was conducted on an augmented dataset of 12000 Martian heightmaps derived from NASA's HiRISE survey.<n>A non-homogeneous rescaling strategy captures terrain features across multiple scales before resizing to a fixed 128x128 model resolution.<n>Results show that our approach consistently outperforms these methods in terms of reconstruction accuracy (4-15% on RMSE) and perceptual similarity (29-81% on LPIPS) with the original data.
arXiv Detail & Related papers (2025-10-16T15:02:05Z)
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales [61.03549470159347]
Vision-language models (VLMs) have advanced rapidly, yet their capacity for image-grounded geolocation in open-world conditions has not been comprehensively evaluated.<n>We present EarthWhere, a comprehensive benchmark for VLM image geolocation that evaluates visual recognition, step-by-step reasoning, and evidence use.
arXiv Detail & Related papers (2025-10-13T01:12:21Z)
Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions [116.56517155163716]
We propose a data curation pipeline that reconstructs 3D Martian environments from real stereo navigation images.<n>A Martian terrain video generator, MarsGen, synthesizes novel videos visually realistic and geometrically consistent with the 3D structure encoded in the data.<n>Our approach outperforms video synthesis models trained on terrestrial datasets, achieving superior visual fidelity and 3D structural consistency.
arXiv Detail & Related papers (2025-07-10T17:54:27Z)
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z)
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation.<n>We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities.<n> experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components.<n>GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric.<n>We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z)
TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning [36.725822223732635]
We propose TorchSpatial, a learning framework and benchmark for location (point) encoding.<n>TorchSpatial contains three key components: 1) a unified location encoding framework that consolidates 15 commonly recognized location encoders; 2) the LocBench benchmark tasks encompassing 7 geo-aware image classification and 10 geo-aware image regression datasets; and 3) a comprehensive suite of evaluation metrics to quantify geo-aware model's overall performance as well as their geographic bias, with a novel Geo-Bias Score metric.
arXiv Detail & Related papers (2024-06-21T21:33:16Z)
ConeQuest: A Benchmark for Cone Segmentation on Mars [9.036303895516745]
ConeQuest is the first expert-annotated public dataset to identify cones on Mars. We propose two benchmark tasks using ConeQuest: (i) Spatial Generalization and (ii) Cone-size Generalization.
arXiv Detail & Related papers (2023-11-15T02:33:08Z)
Towards Robust Monocular Visual Odometry for Flying Robots on Planetary Missions [49.79068659889639]
Ingenuity, that just landed on Mars, will mark the beginning of a new era of exploration unhindered by traversability. We present an advanced robust monocular odometry algorithm that uses efficient optical flow tracking. We also present a novel approach to estimate the current risk of scale drift based on a principal component analysis of the relative translation information matrix.
arXiv Detail & Related papers (2021-09-12T12:52:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.