An Empirical Study of Remote Sensing Pretraining
- URL: http://arxiv.org/abs/2204.02825v4
- Date: Thu, 4 May 2023 16:53:19 GMT
- Title: An Empirical Study of Remote Sensing Pretraining
- Authors: Di Wang, Jing Zhang, Bo Du, Gui-Song Xia and Dacheng Tao
- Abstract summary: We conduct an empirical study of remote sensing pretraining (RSP) on aerial images.
RSP can help deliver distinctive performances in scene recognition tasks.
RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, but it may still suffer from task discrepancies.
- Score: 117.90699699469639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has largely reshaped remote sensing (RS) research for aerial
image understanding and made a great success. Nevertheless, most of the
existing deep models are initialized with the ImageNet pretrained weights.
Since natural images inevitably present a large domain gap relative to aerial
images, probably limiting the finetuning performance on downstream aerial scene
tasks. This issue motivates us to conduct an empirical study of remote sensing
pretraining (RSP) on aerial images. To this end, we train different networks
from scratch with the help of the largest RS scene recognition dataset up to
now -- MillionAID, to obtain a series of RS pretrained backbones, including
both convolutional neural networks (CNN) and vision transformers such as Swin
and ViTAE, which have shown promising performance on computer vision tasks.
Then, we investigate the impact of RSP on representative downstream tasks
including scene recognition, semantic segmentation, object detection, and
change detection using these CNN and vision transformer backbones. Empirical
study shows that RSP can help deliver distinctive performances in scene
recognition tasks and in perceiving RS related semantics such as "Bridge" and
"Airplane". We also find that, although RSP mitigates the data discrepancies of
traditional ImageNet pretraining on RS images, it may still suffer from task
discrepancies, where downstream tasks require different representations from
scene recognition tasks. These findings call for further research efforts on
both large-scale pretraining datasets and effective pretraining methods. The
codes and pretrained models will be released at
https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing.
Related papers
- MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Generic Knowledge Boosted Pre-training For Remote Sensing Images [46.071496675604884]
Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP) is a novel remote sensing pre-training framework.
GeRSP learns robust representations from remote sensing and natural images for remote sensing understanding tasks.
We show that GeRSP can effectively learn robust representations in a unified manner, improving the performance of remote sensing downstream tasks.
arXiv Detail & Related papers (2024-01-09T15:36:07Z) - UAVs and Neural Networks for search and rescue missions [0.0]
We present a method for detecting objects of interest, including cars, humans, and fire, in aerial images captured by unmanned aerial vehicles (UAVs)
To achieve this, we use artificial neural networks and create a dataset for supervised learning.
arXiv Detail & Related papers (2023-10-09T08:27:35Z) - Supervised and Contrastive Self-Supervised In-Domain Representation
Learning for Dense Prediction Problems in Remote Sensing [0.0]
This paper explores the effectiveness of in-domain representations in both supervised and self-supervised forms to solve the domain difference between remote sensing and the ImageNet dataset.
For self-supervised pre-training, we have utilized the SimSiam algorithm as it is simple and does not need huge computational resources.
Our results have demonstrated that using datasets with a high spatial resolution for self-supervised representation learning leads to high performance in downstream tasks.
arXiv Detail & Related papers (2023-01-29T20:56:51Z) - Is Deep Image Prior in Need of a Good Education? [57.3399060347311]
Deep image prior was introduced as an effective prior for image reconstruction.
Despite its impressive reconstructive properties, the approach is slow when compared to learned or traditional reconstruction techniques.
We develop a two-stage learning paradigm to address the computational challenge.
arXiv Detail & Related papers (2021-11-23T15:08:26Z) - Homography augumented momentum constrastive learning for SAR image
retrieval [3.9743795764085545]
We propose a deep learning-based image retrieval approach using homography transformation augmented contrastive learning.
We also propose a training method for the DNNs induced by contrastive learning that does not require any labeling procedure.
arXiv Detail & Related papers (2021-09-21T17:27:07Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - RDAnet: A Deep Learning Based Approach for Synthetic Aperture Radar
Image Formation [0.0]
We train a deep neural network that performs both the image formation and image processing tasks, integrating the SAR processing pipeline.
Results show that our integrated pipeline can output accurately classified SAR imagery with image quality comparable to those formed using a traditional algorithm.
arXiv Detail & Related papers (2020-01-22T18:44:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.