Consecutive Pretraining: A Knowledge Transfer Learning Strategy with
Relevant Unlabeled Data for Remote Sensing Domain
- URL: http://arxiv.org/abs/2207.03860v1
- Date: Fri, 8 Jul 2022 12:32:09 GMT
- Title: Consecutive Pretraining: A Knowledge Transfer Learning Strategy with
Relevant Unlabeled Data for Remote Sensing Domain
- Authors: Tong Zhang, Peng Gao, Hao Dong, Yin Zhuang, Guanqun Wang, Wei Zhang,
He Chen
- Abstract summary: ConSecutive PreTraining (CSPT) is proposed based on the idea of not stopping pretraining in natural language processing (NLP)
The proposed CSPT also can release the huge potential of unlabeled data for task-aware model training.
The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in RSD can outperform the previous method of supervised pretraining-then-fine-tuning.
- Score: 25.84756140221655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, under supervised learning, a model pretrained by a large-scale
nature scene dataset and then fine-tuned on a few specific task labeling data
is the paradigm that has dominated the knowledge transfer learning. It has
reached the status of consensus solution for task-aware model training in
remote sensing domain (RSD). Unfortunately, due to different categories of
imaging data and stiff challenges of data annotation, there is not a large
enough and uniform remote sensing dataset to support large-scale pretraining in
RSD. Moreover, pretraining models on large-scale nature scene datasets by
supervised learning and then directly fine-tuning on diverse downstream tasks
seems to be a crude method, which is easily affected by inevitable labeling
noise, severe domain gaps and task-aware discrepancies. Thus, in this paper,
considering the self-supervised pretraining and powerful vision transformer
(ViT) architecture, a concise and effective knowledge transfer learning
strategy called ConSecutive PreTraining (CSPT) is proposed based on the idea of
not stopping pretraining in natural language processing (NLP), which can
gradually bridge the domain gap and transfer knowledge from the nature scene
domain to the RSD. The proposed CSPT also can release the huge potential of
unlabeled data for task-aware model training. Finally, extensive experiments
are carried out on twelve datasets in RSD involving three types of downstream
tasks (e.g., scene classification, object detection and land cover
classification) and two types of imaging data (e.g., optical and SAR). The
results show that by utilizing the proposed CSPT for task-aware model training,
almost all downstream tasks in RSD can outperform the previous method of
supervised pretraining-then-fine-tuning and even surpass the state-of-the-art
(SOTA) performance without any expensive labeling consumption and careful model
design.
Related papers
- Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations [76.45009891152178]
Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks.
We show, for the first time, that general representations learning can be achieved through the task of occupancy prediction.
Our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
arXiv Detail & Related papers (2023-09-19T11:13:01Z) - In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene
Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification.
We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.