SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing
- URL: http://arxiv.org/abs/2508.21402v1
- Date: Fri, 29 Aug 2025 08:19:16 GMT
- Title: SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing
- Authors: Jakub Straka, Ivan Gruber,
- Abstract summary: Self-supervised learning has emerged as a powerful tool for remote sensing, where large amounts of unlabeled data are available.<n>In this work, we investigate the use of DINO, a contrastive self-supervised method, for pretraining on remote sensing imagery.<n>We introduce SatDINO, a model tailored for representation learning in satellite imagery.
- Score: 0.3437656066916039
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning has emerged as a powerful tool for remote sensing, where large amounts of unlabeled data are available. In this work, we investigate the use of DINO, a contrastive self-supervised method, for pretraining on remote sensing imagery. We introduce SatDINO, a model tailored for representation learning in satellite imagery. Through extensive experiments on multiple datasets in multiple testing setups, we demonstrate that SatDINO outperforms other state-of-the-art methods based on much more common masked autoencoders (MAE) and achieves competitive results in multiple benchmarks. We also provide a rigorous ablation study evaluating SatDINO's individual components. Finally, we propose a few novel enhancements, such as a new way to incorporate ground sample distance (GSD) encoding and adaptive view sampling. These enhancements can be used independently on our SatDINO model. Our code and trained models are available at: https://github.com/strakaj/SatDINO.
Related papers
- Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite
Imagery [5.671254904219855]
We develop a new encoder architecture called USat that can input multi-spectral data from multiple sensors for self-supervised pre-training.
We integrate USat into a Masked Autoencoder (MAE) self-supervised pre-training procedure and find that a pre-trained USat outperforms state-of-the-art MAE models trained on remote sensing data.
arXiv Detail & Related papers (2023-12-02T19:17:04Z) - Exploring DINO: Emergent Properties and Limitations for Synthetic
Aperture Radar Imagery [5.057850174013128]
This study delves into the emergent characteristics of the Self-Distillation with No Labels (DINO) algorithm and its application to Synthetic Aperture Radar (SAR) imagery.
We pre-train a vision transformer (ViT)-based DINO model using unlabeled SAR data, and later fine-tune the model to predict high-resolution land cover maps.
We show that ViT attention maps hold great intrinsic value for remote sensing, and could provide useful inputs to other algorithms.
arXiv Detail & Related papers (2023-10-05T12:48:12Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - SEnSeI: A Deep Learning Module for Creating Sensor Independent Cloud
Masks [0.7340845393655052]
We introduce a novel neural network architecture -- Spectral ENcoder for SEnsor Independence (SEnSeI)
We focus on the problem of cloud masking, using several pre-existing datasets, and a new, freely available dataset for Sentinel-2.
Our model is shown to achieve state-of-the-art performance on the satellites it was trained on (Sentinel-2 and Landsat 8), and is able to extrapolate to sensors it has not seen during training such as Landsat 7, Per'uSat-1, and Sentinel-3 SLSTR.
arXiv Detail & Related papers (2021-11-16T10:47:10Z) - Representation Learning for Remote Sensing: An Unsupervised Sensor
Fusion Approach [0.0]
We propose Contrastive Sensor Fusion, which exploits coterminous data from multiple sources to learn useful representations of every possible combination of those sources.
Using a dataset of 47 million unlabeled coterminous image triplets, we train an encoder to produce meaningful representations from any possible combination of channels from the input sensors.
These representations outperform fully supervised ImageNet weights on a remote sensing classification task and improve as more sensors are fused.
arXiv Detail & Related papers (2021-08-11T08:32:58Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.