Related papers: SpectralEarth: Training Hyperspectral Foundation Models at Scale

SpectralEarth: Training Hyperspectral Foundation Models at Scale

URL: http://arxiv.org/abs/2408.08447v1
Date: Thu, 15 Aug 2024 22:55:59 GMT
Title: SpectralEarth: Training Hyperspectral Foundation Models at Scale
Authors: Nassim Ait Ali Braham, Conrad M Albrecht, Julien Mairal, Jocelyn Chanussot, Yi Wang, Xiao Xiang Zhu,
Abstract summary: We introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models. We pretrain a series of foundation models on SpectralEarth using state-of-the-art self-supervised learning (SSL) algorithms. We construct four downstream datasets for land-cover and crop-type mapping, providing benchmarks for model evaluation.
Score: 47.93167977587301
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models have triggered a paradigm shift in computer vision and are increasingly being adopted in remote sensing, particularly for multispectral imagery. Yet, their potential in hyperspectral imaging (HSI) remains untapped due to the absence of comprehensive and globally representative hyperspectral datasets. To close this gap, we introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models leveraging data from the Environmental Mapping and Analysis Program (EnMAP). SpectralEarth comprises 538,974 image patches covering 415,153 unique locations from more than 11,636 globally distributed EnMAP scenes spanning two years of archive. Additionally, 17.5% of these locations include multiple timestamps, enabling multi-temporal HSI analysis. Utilizing state-of-the-art self-supervised learning (SSL) algorithms, we pretrain a series of foundation models on SpectralEarth. We integrate a spectral adapter into classical vision backbones to accommodate the unique characteristics of HSI. In tandem, we construct four downstream datasets for land-cover and crop-type mapping, providing benchmarks for model evaluation. Experimental results support the versatility of our models, showcasing their generalizability across different tasks and sensors. We also highlight computational efficiency during model fine-tuning. The dataset, models, and source code will be made publicly available.

Related papers

CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis [75.25966323298003]
Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding. variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies. We introduce $textbfCARL$, a model for $textbfC$amera-$textbfA$gnostic $textbfR$esupervised $textbfL$ across RGB, multispectral, and hyperspectral imaging modalities.
arXiv Detail & Related papers (2025-04-27T13:06:40Z)
Beyond the Visible: Multispectral Vision-Language Learning for Earth Observation [3.4719449211802456]
We introduce Llama3-MS-CLIP, the first vision-language model pre-trained with contrastive learning on a large-scale multispectral dataset. We present the largest-to-date image-caption dataset for multispectral data, consisting of one million Sentinel-2 samples. We evaluate Llama3-MS-CLIP on multispectral zero-shot image classification and retrieval using three datasets of varying complexity.
arXiv Detail & Related papers (2025-03-20T09:13:31Z)
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z)
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities [5.767156832161819]
We propose AnySat, a multimodal model based on joint embedding predictive architecture (JEPA) and scale-adaptive spatial encoders. To demonstrate the advantages of this unified approach, we compile GeoPlex, a collection of $5$ multimodal datasets. We then train a single powerful model on these diverse datasets simultaneously.
arXiv Detail & Related papers (2024-12-18T18:11:53Z)
Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation [48.66623377464203]
Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science. This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks.
arXiv Detail & Related papers (2024-03-22T17:11:47Z)
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery [35.550999964460466]
We present SkySense, a generic billion-scale model, pre-trained on a curated multi-modal Remote Sensing dataset with 21.5 million temporal sequences. To our best knowledge, SkySense is the largest Multi-Modal to date, whose modules can be flexibly combined or used individually to accommodate various tasks.
arXiv Detail & Related papers (2023-12-15T09:57:21Z)
Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z)
SpectralGPT: Spectral Remote Sensing Foundation Model [60.023956954916414]
A universal RS foundation model, named SpectralGPT, is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT) Compared to existing foundation models, SpectralGPT accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience.
arXiv Detail & Related papers (2023-11-13T07:09:30Z)
Foundation Models for Generalist Geospatial Artificial Intelligence [3.7002058945990415]
This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive data. We have utilized this framework to create Prithvi, a transformer-based foundational model pre-trained on more than 1TB of multispectral satellite imagery.
arXiv Detail & Related papers (2023-10-28T10:19:55Z)
Concept Drift and Long-Tailed Distribution in Fine-Grained Visual Categorization: Benchmark and Method [84.68818879525568]
We present a Concept Drift and Long-Tailed Distribution dataset. The characteristics of instances tend to vary with time and exhibit a long-tailed distribution. We propose a feature recombination framework to address the learning challenges associated with CDLT.
arXiv Detail & Related papers (2023-06-04T12:42:45Z)
Deep Autoregressive Models with Spectral Attention [74.08846528440024]
We propose a forecasting architecture that combines deep autoregressive models with a Spectral Attention (SA) module. By characterizing in the spectral domain the embedding of the time series as occurrences of a random process, our method can identify global trends and seasonality patterns. Two spectral attention models, global and local to the time series, integrate this information within the forecast and perform spectral filtering to remove time series's noise.
arXiv Detail & Related papers (2021-07-13T11:08:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.