Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications
- URL: http://arxiv.org/abs/2412.02732v2
- Date: Mon, 03 Feb 2025 15:21:58 GMT
- Title: Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications
- Authors: Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, Joao Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Carlos Gomes, Ankur Kumar, Myscon Truong, Denys Godwin, Hyunho Lee, Chia-Yu Hsu, Ata Akbari Asanjan, Besart Mujeci, Disha Shidham, Trevor Keenan, Paulo Arevalo, Wenwen Li, Hamed Alemohammad, Pontus Olofsson, Christopher Hain, Robert Kennedy, Bianca Zadrozny, David Bell, Gabriele Cavallaro, Campbell Watson, Manil Maskey, Rahul Ramachandran, Juan Bernabe Moreno,
- Abstract summary: Prithvi-EO-2.0 is a new geospatial foundation model that offers significant improvements over its predecessor.<n>It is trained on 4.2M global time series samples from NASA's Harmonized Landsat and Sentinel-2 data archive at 30m resolution.<n>The 600M version outperforms the previous Prithvi-EO model by 8% across a range of tasks.
- Score: 5.1875922375491585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This technical report presents Prithvi-EO-2.0, a new geospatial foundation model that offers significant improvements over its predecessor, Prithvi-EO-1.0. Trained on 4.2M global time series samples from NASA's Harmonized Landsat and Sentinel-2 data archive at 30m resolution, the new 300M and 600M parameter models incorporate temporal and location embeddings for enhanced performance across various geospatial tasks. Through extensive benchmarking with GEO-Bench, the 600M version outperforms the previous Prithvi-EO model by 8\% across a range of tasks. It also outperforms six other geospatial foundation models when benchmarked on remote sensing tasks from different domains and resolutions (i.e. from 0.1m to 15m). The results demonstrate the versatility of the model in both classical earth observation and high-resolution applications. Early involvement of end-users and subject matter experts (SMEs) are among the key factors that contributed to the project's success. In particular, SME involvement allowed for constant feedback on model and dataset design, as well as successful customization for diverse SME-led applications in disaster response, land use and crop mapping, and ecosystem dynamics monitoring. Prithvi-EO-2.0 is available on Hugging Face and IBM terratorch, with additional resources on GitHub. The project exemplifies the Trusted Open Science approach embraced by all involved organizations.
Related papers
- Comparative Assessment of Multimodal Earth Observation Data for Soil Moisture Estimation [0.9674544640949528]
We present a high-resolution (10m) soil moisture estimation framework for vegetated areas across Europe.<n>We compare modality combinations with temporal parameterizations, using spatial cross-validation, to ensure geographic generalization.<n>We also evaluate whether foundation model embeddings from IBM-NASA's Prithvi model improve upon traditional hand-crafted spectral features.
arXiv Detail & Related papers (2026-02-20T09:17:12Z) - GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI [52.13138825802668]
GeoFMs are transforming Earth Observation, but evaluation lacks standardized protocols.<n> GEO-Bench-2 addresses this with a comprehensive framework spanning classification, segmentation, regression, object detection, and instance segmentation.<n>Code, data, and leaderboard for GEO-Bench-2 are publicly released under a permissive license.
arXiv Detail & Related papers (2025-11-19T17:45:02Z) - OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation [68.10925029626709]
OlmoEarth is a multimodal, sequential-temporal foundation model designed for the Earth observation domain.<n>OlmoEarth achieves state-of-the-art performance compared to 12 other foundation models.<n>We deploy OlmoEarth as the backbone of an end-to-end platform for data collection, labeling, training observation and inference of Earth observation models.
arXiv Detail & Related papers (2025-11-17T18:06:26Z) - TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis [0.2479153065703935]
We present TESSERA, an open, global, land-oriented remote sensing foundation model.<n>We use two parallel Transformer-based encoders to combine optical data from ten Sentinel-2 spectral bands at 10-60m spatial resolution and two Sentinel-1 synthetic aperture radar back coefficients at 10m resolution to create embeddings that are subsequently fused with a multilayer perceptron to create annual global embedding maps.
arXiv Detail & Related papers (2025-06-25T12:46:26Z) - Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition [0.0]
We investigate whether foundation models pretrained on remote sensing and general vision datasets can be effectively combined to improve performance.<n>The results show that feature-level ensembling of smaller pretrained models can match or exceed the performance of much larger models.<n>The study highlights the potential of applying knowledge distillation to transfer the strengths of ensembles into more compact models.
arXiv Detail & Related papers (2025-06-25T07:02:42Z) - WorldPM: Scaling Human Preference Modeling [130.23230492612214]
We propose World Preference Modeling$ (WorldPM) to emphasize this scaling potential.<n>We collect preference data from public forums covering diverse user communities.<n>We conduct extensive training using 15M-scale data across models ranging from 1.5B to 72B parameters.
arXiv Detail & Related papers (2025-05-15T17:38:37Z) - Towards a Unified Copernicus Foundation Model for Earth Vision [39.500074980218926]
We take a step towards next-generation Earth observation foundation models with three key components.
Copernicus-Pretrain is a massive-scale pretraining dataset that integrates 18.7M aligned images from all major Copernicus Sentinel missions.
Copernicus-FM is a unified foundation model capable of processing any spectral or non-spectral sensor modality.
arXiv Detail & Related papers (2025-03-14T20:16:48Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.
We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models [7.422346909538787]
We introduce MapEval, a benchmark designed to assess diverse and complex map-based user queries with geo-spatial reasoning.
MapEval consists of 700 unique multiple-choice questions about locations across 180 cities and 54 countries.
Our detailed analyses provide insights into the strengths and weaknesses of current models, though all models still fall short of human performance by more than 20% on average.
This gap highlights MapEval's critical role in advancing general-purpose foundation models with stronger geo-spatial understanding.
arXiv Detail & Related papers (2024-12-31T07:20:32Z) - GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.<n>Our benchmark features over 10,000 manually verified instructions and covers a diverse set of variations in visual conditions, object type, and scale.<n>We evaluate several state-of-the-art VLMs to assess their accuracy within the geospatial context.
arXiv Detail & Related papers (2024-11-28T18:59:56Z) - SpectralEarth: Training Hyperspectral Foundation Models at Scale [47.93167977587301]
We introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models.
We pretrain a series of foundation models on SpectralEarth using state-of-the-art self-supervised learning (SSL) algorithms.
We construct four downstream datasets for land-cover and crop-type mapping, providing benchmarks for model evaluation.
arXiv Detail & Related papers (2024-08-15T22:55:59Z) - ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability [10.88886669820126]
We introduce the Oak Ridge Base Foundation Model for Earth System Predictability (ORBIT)
ORBIT is the largest model of its kind and surpasses the current climate AI foundation model size by a thousandfold.
Performance scaling tests on the Frontier supercomputer have demonstrated that ORBIT achieves 684 petaFLOPS to 1.6 exaFLOPS sustained throughput.
arXiv Detail & Related papers (2024-04-23T03:39:57Z) - PhilEO Bench: Evaluating Geo-Spatial Foundation Models [30.02962498304698]
This paper introduces the PhilEO Bench, a novel evaluation framework for EO Foundation Models.
The framework comprises of a testbed and a novel 400 GB Sentinel-2 dataset.
We present experiments using our framework evaluating different Foundation Models, including Prithvi and SatMAE.
arXiv Detail & Related papers (2024-01-09T09:58:42Z) - Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.
Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - Foundation Models for Generalist Geospatial Artificial Intelligence [3.7002058945990415]
This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive data.
We have utilized this framework to create Prithvi, a transformer-based foundational model pre-trained on more than 1TB of multispectral satellite imagery.
arXiv Detail & Related papers (2023-10-28T10:19:55Z) - SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation [83.18930314027254]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X) with up to ViT-Huge as the backbone.
With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments.
arXiv Detail & Related papers (2023-09-29T17:58:06Z) - GEO-Bench: Toward Foundation Models for Earth Monitoring [139.77907168809085]
We propose a benchmark comprised of six classification and six segmentation tasks.
This benchmark will be a driver of progress across a variety of Earth monitoring tasks.
arXiv Detail & Related papers (2023-06-06T16:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.