Related papers: OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

URL: http://arxiv.org/abs/2511.13655v1
Date: Mon, 17 Nov 2025 18:06:26 GMT
Title: OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation
Authors: Henry Herzog, Favyen Bastani, Yawen Zhang, Gabriel Tseng, Joseph Redmon, Hadrien Sablon, Ryan Park, Jacob Morrison, Alexandra Buraczynski, Karen Farley, Joshua Hansen, Andrew Howe, Patrick Alan Johnson, Mark Otterlee, Ted Schmitt, Hunter Pitelka, Stephen Daspit, Rachel Ratner, Christopher Wilhelm, Sebastian Wood, Mike Jacobi, Hannah Kerner, Evan Shelhamer, Ali Farhadi, Ranjay Krishna, Patrick Beukema,
Abstract summary: OlmoEarth is a multimodal, sequential-temporal foundation model designed for the Earth observation domain.<n>OlmoEarth achieves state-of-the-art performance compared to 12 other foundation models.<n>We deploy OlmoEarth as the backbone of an end-to-end platform for data collection, labeling, training observation and inference of Earth observation models.
Score: 68.10925029626709
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Earth observation data presents a unique challenge: it is spatial like images, sequential like video or text, and highly multimodal. We present OlmoEarth: a multimodal, spatio-temporal foundation model that employs a novel self-supervised learning formulation, masking strategy, and loss all designed for the Earth observation domain. OlmoEarth achieves state-of-the-art performance compared to 12 other foundation models across a variety of research benchmarks and real-world tasks from external partners. When evaluating embeddings OlmoEarth achieves the best performance on 15 out of 24 tasks, and with full fine-tuning it is the best on 19 of 29 tasks. We deploy OlmoEarth as the backbone of an end-to-end platform for data collection, labeling, training, and inference of Earth observation models. The OlmoEarth Platform puts frontier foundation models and powerful data management tools into the hands of non-profits and NGOs working to solve the world's biggest problems. OlmoEarth source code, training data, and pre-trained weights are available at $\href{https://github.com/allenai/olmoearth_pretrain}{\text{https://github.com/allenai/olmoearth_pretrain}}$.

Related papers

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z)
OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data [72.98496934729245]
Existing benchmarks for multimodal learning in Earth science offer limited, siloed coverage of Earth's spheres and their cross-sphere interactions.<n>We introduce textbf OmniEarth-Bench, the first multimodal benchmark that systematically spans all six spheres.<n>Built with a scalable, modular-topology data inference framework and native multi-observation sources, OmniEarth-Bench produces 29,855 standardized, expert-curated annotations.
arXiv Detail & Related papers (2025-05-29T15:02:27Z)
Towards LLM Agents for Earth Observation [63.163707376462405]
We introduce datasetnamenospace, a benchmark of 140 yes/no questions from NASA Earth Observatory articles across 13 topics and 17 satellite sensors.<n>Using Google Earth Engine API as a tool, LLM agents can only achieve an accuracy of 33% because the code fails to run over 58% of the time.<n>We improve the failure rate for open models by fine-tuning synthetic data, allowing much smaller models to achieve comparable accuracy to much larger ones.
arXiv Detail & Related papers (2025-04-16T14:19:25Z)
TerraMesh: A Planetary Mosaic of Multimodal Earth Observation Data [3.674991996196602]
TerraMesh is a new globally diverse, multimodal dataset combining optical, radar, elevation, aperture and land-ready modalities in a Data-Ready format.<n>We provide detailed data processing steps, comprehensive statistics, and empirical evidence demonstrating improved model performance when pre-trained on TerraMesh.
arXiv Detail & Related papers (2025-04-15T13:20:35Z)
TerraMind: Large-Scale Generative Multimodality for Earth Observation [9.1127434195526]
We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation.<n>Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data.
arXiv Detail & Related papers (2025-04-15T13:17:39Z)
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.<n>The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.<n>Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z)
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control [122.65089441381741]
We present GEM, a Generalizable Ego-vision Multimodal world model.<n>It predicts future frames using a reference frame, sparse features, human poses, and ego-trajectories.<n>Our dataset is comprised of 4000+ hours of multimodal data across domains like autonomous driving, egocentric human activities, and drone flights.
arXiv Detail & Related papers (2024-12-15T14:21:19Z)
Ben-ge: Extending BigEarthNet with Geographical and Environmental Data [1.1377027568901037]
We present the ben-ge dataset, which supplements the BigEarthNet-MM dataset by compiling freely and globally available geographical and environmental data. Based on this dataset, we showcase the value of combining different data modalities for the downstream tasks of patch-based land-use/land-cover classification and land-use/land-cover segmentation.
arXiv Detail & Related papers (2023-07-04T14:17:54Z)
SSL4EO-L: Datasets and Foundation Models for Landsat Imagery [8.34029977985994]
The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis. This paper introduces SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth Observation for the Landsat family of satellites.
arXiv Detail & Related papers (2023-06-15T18:11:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.