USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite
Imagery
- URL: http://arxiv.org/abs/2312.02199v1
- Date: Sat, 2 Dec 2023 19:17:04 GMT
- Title: USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite
Imagery
- Authors: Jeremy Irvin, Lucas Tao, Joanne Zhou, Yuntao Ma, Langston Nashold,
Benjamin Liu, Andrew Y. Ng
- Abstract summary: We develop a new encoder architecture called USat that can input multi-spectral data from multiple sensors for self-supervised pre-training.
We integrate USat into a Masked Autoencoder (MAE) self-supervised pre-training procedure and find that a pre-trained USat outperforms state-of-the-art MAE models trained on remote sensing data.
- Score: 5.671254904219855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large, self-supervised vision models have led to substantial advancements for
automatically interpreting natural images. Recent works have begun tailoring
these methods to remote sensing data which has rich structure with
multi-sensor, multi-spectral, and temporal information providing massive
amounts of self-labeled data that can be used for self-supervised pre-training.
In this work, we develop a new encoder architecture called USat that can input
multi-spectral data from multiple sensors for self-supervised pre-training.
USat is a vision transformer with modified patch projection layers and
positional encodings to model spectral bands with varying spatial scales from
multiple sensors. We integrate USat into a Masked Autoencoder (MAE)
self-supervised pre-training procedure and find that a pre-trained USat
outperforms state-of-the-art self-supervised MAE models trained on remote
sensing data on multiple remote sensing benchmark datasets (up to 8%) and leads
to improvements in low data regimes (up to 7%). Code and pre-trained weights
are available at https://github.com/stanfordmlgroup/USat .
Related papers
- Cross-sensor self-supervised training and alignment for remote sensing [2.1178416840822027]
We introduce cross-sensor self-supervised training and alignment for remote sensing (X-STARS)
X-STARS can be applied to train models from scratch, or adapt large models pretrained on e.g low-resolution data to new high-resolution sensors.
We demonstrate that X-STARS outperforms the state-of-the-art by a significant margin with less data across various conditions of data availability and resolutions.
arXiv Detail & Related papers (2024-05-16T09:25:45Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - SatMAE: Pre-training Transformers for Temporal and Multi-Spectral
Satellite Imagery [74.82821342249039]
We present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE)
To leverage temporal information, we include a temporal embedding along with independently masking image patches across time.
arXiv Detail & Related papers (2022-07-17T01:35:29Z) - Integral Migrating Pre-trained Transformer Encoder-decoders for Visual
Object Detection [78.2325219839805]
imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP.
Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by 2.8%.
arXiv Detail & Related papers (2022-05-19T15:11:20Z) - SEnSeI: A Deep Learning Module for Creating Sensor Independent Cloud
Masks [0.7340845393655052]
We introduce a novel neural network architecture -- Spectral ENcoder for SEnsor Independence (SEnSeI)
We focus on the problem of cloud masking, using several pre-existing datasets, and a new, freely available dataset for Sentinel-2.
Our model is shown to achieve state-of-the-art performance on the satellites it was trained on (Sentinel-2 and Landsat 8), and is able to extrapolate to sensors it has not seen during training such as Landsat 7, Per'uSat-1, and Sentinel-3 SLSTR.
arXiv Detail & Related papers (2021-11-16T10:47:10Z) - Representation Learning for Remote Sensing: An Unsupervised Sensor
Fusion Approach [0.0]
We propose Contrastive Sensor Fusion, which exploits coterminous data from multiple sources to learn useful representations of every possible combination of those sources.
Using a dataset of 47 million unlabeled coterminous image triplets, we train an encoder to produce meaningful representations from any possible combination of channels from the input sensors.
These representations outperform fully supervised ImageNet weights on a remote sensing classification task and improve as more sensors are fused.
arXiv Detail & Related papers (2021-08-11T08:32:58Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote
Sensing Data [64.40187171234838]
Seasonal Contrast (SeCo) is an effective pipeline to leverage unlabeled data for in-domain pre-training of re-mote sensing representations.
SeCo will be made public to facilitate transfer learning and enable rapid progress in re-mote sensing applications.
arXiv Detail & Related papers (2021-03-30T18:26:39Z) - Self-Supervised Person Detection in 2D Range Data using a Calibrated
Camera [83.31666463259849]
We propose a method to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors.
We show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained using manual annotations.
Our method is an effective way to improve person detectors during deployment without any additional labeling effort.
arXiv Detail & Related papers (2020-12-16T12:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.