Feature Guided Masked Autoencoder for Self-supervised Learning in Remote
Sensing
- URL: http://arxiv.org/abs/2310.18653v1
- Date: Sat, 28 Oct 2023 09:43:13 GMT
- Title: Feature Guided Masked Autoencoder for Self-supervised Learning in Remote
Sensing
- Authors: Yi Wang, Hugo Hern\'andez Hern\'andez, Conrad M Albrecht, Xiao Xiang
Zhu
- Abstract summary: Masked AutoEncoder (MAE) has attracted wide attention for pretraining vision transformers in remote sensing.
We propose Feature Guided Masked Autoencoder (FG-MAE): reconstructing a combination of Histograms of Oriented Graidents (HOG) and Normalized Difference Indices (NDI) for multispectral images, and reconstructing HOG for SAR images.
- Score: 16.683132793313693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning guided by masked image modelling, such as Masked
AutoEncoder (MAE), has attracted wide attention for pretraining vision
transformers in remote sensing. However, MAE tends to excessively focus on
pixel details, thereby limiting the model's capacity for semantic
understanding, in particular for noisy SAR images. In this paper, we explore
spectral and spatial remote sensing image features as improved
MAE-reconstruction targets. We first conduct a study on reconstructing various
image features, all performing comparably well or better than raw pixels. Based
on such observations, we propose Feature Guided Masked Autoencoder (FG-MAE):
reconstructing a combination of Histograms of Oriented Graidents (HOG) and
Normalized Difference Indices (NDI) for multispectral images, and
reconstructing HOG for SAR images. Experimental results on three downstream
tasks illustrate the effectiveness of FG-MAE with a particular boost for SAR
imagery. Furthermore, we demonstrate the well-inherited scalability of FG-MAE
and release a first series of pretrained vision transformers for medium
resolution SAR and multispectral images.
Related papers
- RS-Mamba for Large Remote Sensing Image Dense Prediction [58.12667617617306]
We propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images.
RSM is specifically designed to capture the global context of remote sensing images with linear complexity.
Our model achieves better efficiency and accuracy than transformer-based models on large remote sensing images.
arXiv Detail & Related papers (2024-04-03T12:06:01Z) - Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote Sensing [5.325585142755542]
We present Cross-Scale MAE, a self-supervised model built upon the Masked Auto-Encoder (MAE).During pre-training, Cross-Scale MAE employs scale augmentation techniques and enforces cross-scale constraints through both contrastive and generative losses.
Experimental evaluations demonstrate that Cross-Scale MAE exhibits superior performance compared to standard MAE and other state-of-the-art remote sensing MAE methods.
arXiv Detail & Related papers (2024-01-29T03:06:19Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Adapting Segment Anything Model for Change Detection in HR Remote
Sensing Images [18.371087310792287]
This work aims to utilize the strong visual recognition capabilities of Vision Foundation Models (VFMs) to improve the change detection of high-resolution Remote Sensing Images (RSIs)
We employ the visual encoder of FastSAM, an efficient variant of the SAM, to extract visual representations in RS scenes.
To utilize the semantic representations that are inherent to SAM features, we introduce a task-agnostic semantic learning branch to model the semantic latent in bi-temporal RSIs.
The resulting method, SAMCD, obtains superior accuracy compared to the SOTA methods and exhibits a sample-efficient learning ability that is comparable to semi-
arXiv Detail & Related papers (2023-09-04T08:23:31Z) - GH-Feat: Learning Versatile Generative Hierarchical Features from GANs [61.208757845344074]
We show that a generative feature learned from image synthesis exhibits great potentials in solving a wide range of computer vision tasks.
We first train an encoder by considering the pretrained StyleGAN generator as a learned loss function.
The visual features produced by our encoder, termed as Generative Hierarchical Features (GH-Feat), highly align with the layer-wise GAN representations.
arXiv Detail & Related papers (2023-01-12T21:59:46Z) - Exploring The Role of Mean Teachers in Self-supervised Masked
Auto-Encoders [64.03000385267339]
Masked image modeling (MIM) has become a popular strategy for self-supervised learning(SSL) of visual representations with Vision Transformers.
We present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE.
RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training.
arXiv Detail & Related papers (2022-10-05T08:08:55Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - The Devil is in the Frequency: Geminated Gestalt Autoencoder for
Self-Supervised Visual Pre-Training [13.087987450384036]
We present a new Masked Image Modeling (MIM), termed Geminated Autoencoder (Ge$2$-AE) for visual pre-training.
Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space.
arXiv Detail & Related papers (2022-04-18T09:22:55Z) - Learning Efficient Representations for Enhanced Object Detection on
Large-scene SAR Images [16.602738933183865]
It is a challenging problem to detect and recognize targets on complex large-scene Synthetic Aperture Radar (SAR) images.
Recently developed deep learning algorithms can automatically learn the intrinsic features of SAR images.
We propose an efficient and robust deep learning based target detection method.
arXiv Detail & Related papers (2022-01-22T03:25:24Z) - Contrastive Multiview Coding with Electro-optics for SAR Semantic
Segmentation [0.6445605125467573]
We propose multi-modal representation learning for SAR semantic segmentation.
Unlike previous studies, our method jointly uses EO imagery, SAR imagery, and a label mask.
Several experiments show that our approach is superior to the existing methods in model performance, sample efficiency, and convergence speed.
arXiv Detail & Related papers (2021-08-31T23:55:41Z) - Generative Hierarchical Features from Synthesizing Images [65.66756821069124]
We show that learning to synthesize images can bring remarkable hierarchical visual features that are generalizable across a wide range of applications.
The visual feature produced by our encoder, termed as Generative Hierarchical Feature (GH-Feat), has strong transferability to both generative and discriminative tasks.
arXiv Detail & Related papers (2020-07-20T18:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.