MSEarth: A Benchmark for Multimodal Scientific Comprehension of Earth Science
- URL: http://arxiv.org/abs/2505.20740v1
- Date: Tue, 27 May 2025 05:30:35 GMT
- Title: MSEarth: A Benchmark for Multimodal Scientific Comprehension of Earth Science
- Authors: Xiangyu Zhao, Wanghan Xu, Bo Liu, Yuhao Zhou, Fenghua Ling, Ben Fei, Xiaoyu Yue, Lei Bai, Wenlong Zhang, Xiao-Ming Wu,
- Abstract summary: We introduce MSEarth, a multimodal scientific benchmark curated from high-quality, open-access scientific publications.<n>MSEarth encompasses the five major spheres of Earth science: atmosphere, cryosphere, hydrosphere, lithosphere, and biosphere.<n>MSEarth supports a variety of tasks, including scientific figure captioning, multiple choice questions, and open-ended reasoning challenges.
- Score: 39.224598950224454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of multimodal large language models (MLLMs) has unlocked new opportunities to tackle complex scientific challenges. Despite this progress, their application in addressing earth science problems, especially at the graduate level, remains underexplored. A significant barrier is the absence of benchmarks that capture the depth and contextual complexity of geoscientific reasoning. Current benchmarks often rely on synthetic datasets or simplistic figure-caption pairs, which do not adequately reflect the intricate reasoning and domain-specific insights required for real-world scientific applications. To address these gaps, we introduce MSEarth, a multimodal scientific benchmark curated from high-quality, open-access scientific publications. MSEarth encompasses the five major spheres of Earth science: atmosphere, cryosphere, hydrosphere, lithosphere, and biosphere, featuring over 7K figures with refined captions. These captions are crafted from the original figure captions and enriched with discussions and reasoning from the papers, ensuring the benchmark captures the nuanced reasoning and knowledge-intensive content essential for advanced scientific tasks. MSEarth supports a variety of tasks, including scientific figure captioning, multiple choice questions, and open-ended reasoning challenges. By bridging the gap in graduate-level benchmarks, MSEarth provides a scalable and high-fidelity resource to enhance the development and evaluation of MLLMs in scientific reasoning. The benchmark is publicly available to foster further research and innovation in this field. Resources related to this benchmark can be found at https://huggingface.co/MSEarth and https://github.com/xiangyu-mm/MSEarth.
Related papers
- EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models [96.18182289276649]
We present EarthMind, a novel vision-language framework for multi-granular and multi-sensor Earth Observation (EO) data understanding.<n>EarthMind features two core components: (1) Spatial Attention Prompting (SAP), which reallocates attention within the LLM to enhance pixel-level understanding; and (2) Cross-modal Fusion, which aligns heterogeneous modalities into a shared space.<n>To facilitate multi-sensor fusion evaluation, we propose EarthMind-Bench, a comprehensive benchmark with over 2,000 human-annotated multi-sensor image-question pairs.
arXiv Detail & Related papers (2025-06-02T13:36:05Z) - OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data [42.73179312287478]
We introduce OmniEarth-Bench, the first comprehensive multimodal benchmark spanning all six Earth science spheres.<n>It integrates 29,779 annotations across four tiers: perception, general reasoning, scientific knowledge reasoning and chain-of-thought reasoning.<n>Experiments reveal that even the most advanced models struggle with our benchmarks, where none of them reach 35% accuracy.
arXiv Detail & Related papers (2025-05-29T15:02:27Z) - EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models [36.72915099998998]
We present a professional benchmark for the Earth sciences, designed to evaluate the capabilities of Large Language Models (LLMs) in scientific exploration.<n>Leveraging a corpus of 100,000 research papers, we first construct two Question Answering (QA) datasets: Earth-Iron and Earth-Silver.<n>These datasets encompass five Earth spheres, 114 disciplines, and 11 task categories, assessing knowledge crucial for scientific exploration.
arXiv Detail & Related papers (2025-05-22T06:46:08Z) - OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence.<n>We propose a MLLM (OmniGeo) tailored to geospatial applications.<n>By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z) - Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning [51.11965014462375]
Multimodal Large Language Models (MLLMs) integrate text, images, and other modalities.<n>This paper argues that MLLMs can significantly advance scientific reasoning across disciplines such as mathematics, physics, chemistry, and biology.
arXiv Detail & Related papers (2025-02-05T04:05:27Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models [35.98892300665275]
We introduce the SciKnowEval benchmark, a framework that evaluates large language models (LLMs) across five progressive levels of scientific knowledge.
These levels aim to assess the breadth and depth of scientific knowledge in LLMs, including memory, comprehension, reasoning, discernment, and application.
We benchmark 26 advanced open-source and proprietary LLMs using zero-shot and few-shot prompting strategies.
arXiv Detail & Related papers (2024-06-13T13:27:52Z) - Scientific Large Language Models: A Survey on Biological & Chemical Domains [47.97810890521825]
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension.
The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines.
As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration.
arXiv Detail & Related papers (2024-01-26T05:33:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.