Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks
- URL: http://arxiv.org/abs/2510.24010v1
- Date: Tue, 28 Oct 2025 02:34:08 GMT
- Title: Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks
- Authors: Mirali Purohit, Bimal Gajera, Vatsal Malaviya, Irish Mehta, Kunal Kasodekar, Jacob Adler, Steven Lu, Umaa Rebbapragada, Hannah Kerner,
- Abstract summary: A key enabler of progress in other domains has been the availability of standardized benchmarks that support systematic evaluation.<n>We introduce Mars-Bench, the first benchmark designed to systematically evaluate models across a broad range of Mars-related tasks.<n>We provide standardized, ready-to-use datasets and baseline evaluations using models pre-trained on natural images, Earth satellite data, and state-of-the-art vision-language models.
- Score: 7.399515278460871
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models have enabled rapid progress across many specialized domains by leveraging large-scale pre-training on unlabeled data, demonstrating strong generalization to a variety of downstream tasks. While such models have gained significant attention in fields like Earth Observation, their application to Mars science remains limited. A key enabler of progress in other domains has been the availability of standardized benchmarks that support systematic evaluation. In contrast, Mars science lacks such benchmarks and standardized evaluation frameworks, which have limited progress toward developing foundation models for Martian tasks. To address this gap, we introduce Mars-Bench, the first benchmark designed to systematically evaluate models across a broad range of Mars-related tasks using both orbital and surface imagery. Mars-Bench comprises 20 datasets spanning classification, segmentation, and object detection, focused on key geologic features such as craters, cones, boulders, and frost. We provide standardized, ready-to-use datasets and baseline evaluations using models pre-trained on natural images, Earth satellite data, and state-of-the-art vision-language models. Results from all analyses suggest that Mars-specific foundation models may offer advantages over general-domain counterparts, motivating further exploration of domain-adapted pre-training. Mars-Bench aims to establish a standardized foundation for developing and comparing machine learning models for Mars science. Our data, models, and code are available at: https://mars-bench.github.io/.
Related papers
- MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars [21.01507072531742]
We introduce MarsRetrieval, a retrieval benchmark for evaluating vision-language models for Martian geospatial discovery.<n>We propose a unified retrieval-centric protocol to benchmark multimodal embedding architectures.<n>Our evaluation shows MarsRetrieval is challenging, even strong foundation models often fail to capture domain-specific geomorphic distinctions.
arXiv Detail & Related papers (2026-02-15T02:41:56Z) - Inpainting the Red Planet: Diffusion Models for the Reconstruction of Martian Environments in Virtual Reality [0.0]
Training was conducted on an augmented dataset of 12000 Martian heightmaps derived from NASA's HiRISE survey.<n>A non-homogeneous rescaling strategy captures terrain features across multiple scales before resizing to a fixed 128x128 model resolution.<n>Results show that our approach consistently outperforms these methods in terms of reconstruction accuracy (4-15% on RMSE) and perceptual similarity (29-81% on LPIPS) with the original data.
arXiv Detail & Related papers (2025-10-16T15:02:05Z) - Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition [0.0]
We investigate whether foundation models pretrained on remote sensing and general vision datasets can be effectively combined to improve performance.<n>The results show that feature-level ensembling of smaller pretrained models can match or exceed the performance of much larger models.<n>The study highlights the potential of applying knowledge distillation to transfer the strengths of ensembles into more compact models.
arXiv Detail & Related papers (2025-06-25T07:02:42Z) - TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z) - Open World Object Detection in the Era of Foundation Models [53.683963161370585]
We introduce a new benchmark that includes five real-world application-driven datasets.
We introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects.
arXiv Detail & Related papers (2023-12-10T03:56:06Z) - ConeQuest: A Benchmark for Cone Segmentation on Mars [9.036303895516745]
ConeQuest is the first expert-annotated public dataset to identify cones on Mars.
We propose two benchmark tasks using ConeQuest: (i) Spatial Generalization and (ii) Cone-size Generalization.
arXiv Detail & Related papers (2023-11-15T02:33:08Z) - GEO-Bench: Toward Foundation Models for Earth Monitoring [139.77907168809085]
We propose a benchmark comprised of six classification and six segmentation tasks.
This benchmark will be a driver of progress across a variety of Earth monitoring tasks.
arXiv Detail & Related papers (2023-06-06T16:16:05Z) - S$^{5}$Mars: Semi-Supervised Learning for Mars Semantic Segmentation [18.92602724896845]
Mars semantic segmentation is an important Martian vision task, which is the base of rover autonomous planning and safe driving.
There is a lack of sufficient detailed and high-confidence data annotations, which are exactly required by most deep learning methods to obtain a good model.
We propose our solution from the perspective of joint data and method design.
Experimental results show that our method can outperform state-of-the-art SSL approaches remarkably.
arXiv Detail & Related papers (2022-07-04T05:03:10Z) - Embedding Earth: Self-supervised contrastive pre-training for dense land
cover classification [61.44538721707377]
We present Embedding Earth a self-supervised contrastive pre-training method for leveraging the large availability of satellite imagery.
We observe significant improvements up to 25% absolute mIoU when pre-trained with our proposed method.
We find that learnt features can generalize between disparate regions opening up the possibility of using the proposed pre-training scheme.
arXiv Detail & Related papers (2022-03-11T16:14:14Z) - SustainBench: Benchmarks for Monitoring the Sustainable Development
Goals with Machine Learning [63.192289553021816]
Progress toward the United Nations Sustainable Development Goals has been hindered by a lack of data on key environmental and socioeconomic indicators.
Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellites or social media.
In this paper, we introduce SustainBench, a collection of 15 benchmark tasks across 7 SDGs.
arXiv Detail & Related papers (2021-11-08T18:59:04Z) - Towards Robust Monocular Visual Odometry for Flying Robots on Planetary
Missions [49.79068659889639]
Ingenuity, that just landed on Mars, will mark the beginning of a new era of exploration unhindered by traversability.
We present an advanced robust monocular odometry algorithm that uses efficient optical flow tracking.
We also present a novel approach to estimate the current risk of scale drift based on a principal component analysis of the relative translation information matrix.
arXiv Detail & Related papers (2021-09-12T12:52:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.