Related papers: SSL4EO-L: Datasets and Foundation Models for Landsat Imagery

SSL4EO-L: Datasets and Foundation Models for Landsat Imagery

URL: http://arxiv.org/abs/2306.09424v2
Date: Sun, 22 Oct 2023 13:59:26 GMT
Title: SSL4EO-L: Datasets and Foundation Models for Landsat Imagery
Authors: Adam J. Stewart, Nils Lehmann, Isaac A. Corley, Yi Wang, Yi-Chia Chang, Nassim Ait Ali Braham, Shradha Sehgal, Caleb Robinson, Arindam Banerjee
Abstract summary: The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis. This paper introduces SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth Observation for the Landsat family of satellites.
Score: 8.34029977985994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites. The multispectral imagery captured by sensors onboard these satellites is critical for a wide range of scientific fields. Despite the increasing popularity of deep learning and remote sensing, the majority of researchers still use decision trees and random forests for Landsat image analysis due to the prevalence of small labeled datasets and lack of foundation models. In this paper, we introduce SSL4EO-L, the first ever dataset designed for Self-Supervised Learning for Earth Observation for the Landsat family of satellites (including 3 sensors and 2 product levels) and the largest Landsat dataset in history (5M image patches). Additionally, we modernize and re-release the L7 Irish and L8 Biome cloud detection datasets, and introduce the first ML benchmark datasets for Landsats 4-5 TM and Landsat 7 ETM+ SR. Finally, we pre-train the first foundation models for Landsat imagery using SSL4EO-L and evaluate their performance on multiple semantic segmentation tasks. All datasets and model weights are available via the TorchGeo (https://github.com/microsoft/torchgeo) library, making reproducibility and experimentation easy, and enabling scientific advancements in the burgeoning field of remote sensing for a multitude of downstream applications.

Related papers

Using Multiple Input Modalities Can Improve Data-Efficiency and O.O.D. Generalization for ML with Satellite Imagery [3.3964392722361785]
A large majority of machine learning models trained on satellite imagery (SatML) are designed primarily for optical input modalities such as multi-spectral satellite imagery.<n>We generate augmented versions of SatML benchmark tasks by appending additional geographic data layers to datasets spanning classification, regression, and segmentation.<n>We find that fusing additional geographic inputs with optical imagery can significantly improve SatML model performance.
arXiv Detail & Related papers (2025-07-15T22:57:29Z)
Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models [0.6144680854063939]
We introduce Landsat-Bench, a suite of three benchmarks with Landsat imagery that adapt from existing remote sensing datasets.<n>We establish baseline and standardized evaluation methods across both common architectures and Landsat foundation models pretrained on the SSL4EO-L dataset.
arXiv Detail & Related papers (2025-06-10T13:24:55Z)
Towards LLM Agents for Earth Observation [49.92444022073444]
We introduce datasetnamenospace, a benchmark of 140 yes/no questions from NASA Earth Observatory articles across 13 topics and 17 satellite sensors. Using Google Earth Engine API as a tool, LLM agents can only achieve an accuracy of 33% because the code fails to run over 58% of the time. We improve the failure rate for open models by fine-tuning synthetic data, allowing much smaller models to achieve comparable accuracy to much larger ones.
arXiv Detail & Related papers (2025-04-16T14:19:25Z)
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z)
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community [50.16478515591924]
We propose and train the novel LAE-DINO Model, the first open-vocabulary foundation object detector for the LAE task. We conduct experiments on established remote sensing benchmark DIOR, DOTAv2.0, as well as our newly introduced 80-class LAE-80C benchmark. Results demonstrate the advantages of the LAE-1M dataset and the effectiveness of the LAE-DINO method.
arXiv Detail & Related papers (2024-08-17T06:24:43Z)
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes. This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z)
DiffusionSat: A Generative Foundation Model for Satellite Imagery [63.2807119794691]
We present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets. Our method produces realistic samples and can be used to solve multiple generative tasks including temporal generation, superresolution given multi-spectral inputs and in-painting.
arXiv Detail & Related papers (2023-12-06T16:53:17Z)
A Self-Supervised Approach to Land Cover Segmentation [1.0878040851638]
Land use/land cover change (LULC) maps are integral resources in earth science and agricultural research. Due to the nature of such maps, the creation of LULC maps is often constrained by the time and human resources necessary to accurately annotate satellite imagery and remote sensing data. Here, we demonstrate a self-supervised method of land cover segmentation that has no need for high-quality ground truth labels.
arXiv Detail & Related papers (2023-10-27T16:37:36Z)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z)
An Open Hyperspectral Dataset with Sea-Land-Cloud Ground-Truth from the HYPSO-1 Satellite [0.0]
The HYPSO-1 Sea-Land-Cloud-Labeled dataset is an open dataset with 200 diverse hyperspectral images from the HYPSO-1 mission. 38 of these images from different countries include ground-truth labels at pixel-level totaling about 25 million spectral signatures labeled for sea/land/cloud categories.
arXiv Detail & Related papers (2023-08-25T21:35:22Z)
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation [20.94411133447731]
Self-supervised pre-training bears potential to generate expressive representations without human annotation. We share an unlabeled RS dataset SSL4EO-S12 to assemble a global, multimodal, and multi-seasonal corpus of satellite imagery.
arXiv Detail & Related papers (2022-11-13T23:38:27Z)
EarthNets: Empowering AI in Earth Observation [24.160463837610074]
Earth observation (EO) aims at monitoring the state of planet Earth using remote sensing data. This paper presents a comprehensive review of more than 500 publicly published datasets. We propose to measure, rank, and select datasets to build a new benchmark for model evaluation.
arXiv Detail & Related papers (2022-10-10T18:09:35Z)
Satellite Image Time Series Analysis for Big Earth Observation Data [50.591267188664666]
This paper describes sits, an open-source R package for satellite image time series analysis using machine learning. We show that this approach produces high accuracy for land use and land cover maps through a case study in the Cerrado biome.
arXiv Detail & Related papers (2022-04-24T15:23:25Z)
Embedding Earth: Self-supervised contrastive pre-training for dense land cover classification [61.44538721707377]
We present Embedding Earth a self-supervised contrastive pre-training method for leveraging the large availability of satellite imagery. We observe significant improvements up to 25% absolute mIoU when pre-trained with our proposed method. We find that learnt features can generalize between disparate regions opening up the possibility of using the proposed pre-training scheme.
arXiv Detail & Related papers (2022-03-11T16:14:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.