Related papers: Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation

Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation

URL: http://arxiv.org/abs/2504.06962v2
Date: Mon, 28 Apr 2025 15:32:50 GMT
Title: Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation
Authors: Thomas Kerdreux, Alexandre Tuel, Quentin Febvre, Alexis Mouche, Bertrand Chapron,
Abstract summary: Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
Score: 67.23953699167274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO), demonstrating strong transferability across diverse remote sensing tasks. While prior work has focused on network architectures and training strategies, the role of dataset curation, especially in balancing and diversifying pre-training datasets, remains underexplored. In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery, which can lead to biased representations and inefficient training. In this work, we propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance. Our method iteratively refines the training set without requiring a pre-existing feature extractor, making it well-suited for domains where curated datasets are limited or unavailable. We demonstrate our approach on the Sentinel-1 Wave Mode (WV) Synthetic Aperture Radar (SAR) archive, a challenging dataset dominated by ocean observations. We train models from scratch on the entire Sentinel-1 WV archive spanning 10 years. Across three downstream tasks, our results show that dynamic pruning improves both computational efficiency and representation quality, leading to stronger transferability. We also release the weights of OceanSAR-1, the first model in the OceanSAR family, a series of foundation models for ocean observation and analysis using SAR imagery, at github.com/galeio-research/OceanSAR-models/.

Related papers

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Current state-of-the-art methods focus on training innovative architectural designs on confined datasets. We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z)
Advancing ALS Applications with Large-Scale Pre-training: Dataset Development and Downstream Assessment [6.606615641354963]
The pre-training and fine-tuning paradigm has revolutionized satellite remote sensing applications. We construct a large-scale ALS point cloud dataset and evaluate its impact on downstream applications. Our results show that the pre-trained models significantly outperform their scratch counterparts across all downstream tasks.
arXiv Detail & Related papers (2025-01-09T09:21:09Z)
Scale-Translation Equivariant Network for Oceanic Internal Solitary Wave Localization [7.444865250744234]
Internal solitary waves (ISWs) are gravity waves that are often observed in the interior ocean rather than the surface. Cloud cover in optical remote sensing images variably obscures ground information, leading to blurred or missing surface observations. This paper aims at altimeter-based machine learning solutions to automatically locate ISWs.
arXiv Detail & Related papers (2024-06-18T21:09:56Z)
Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining [19.143105229950976]
Land-cover-land-use products provide free global semantic information, as well as vision foundation models that hold strong knowledge of the natural world. We show these free additional resources not only help resolve common contrastive learning bottlenecks, but also significantly boost the efficiency and effectiveness of EO pretraining. We produce multispectral and SAR foundation models that achieve significantly better results in 10 out of 11 downstream tasks than most existing SOTA models.
arXiv Detail & Related papers (2024-05-30T20:19:42Z)
Foundation Models for Generalist Geospatial Artificial Intelligence [3.7002058945990415]
This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive data. We have utilized this framework to create Prithvi, a transformer-based foundational model pre-trained on more than 1TB of multispectral satellite imagery.
arXiv Detail & Related papers (2023-10-28T10:19:55Z)
Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning. We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z)
In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification. We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z)
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID) Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance. This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z)
Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing. By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner. We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.