Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing
- URL: http://arxiv.org/abs/2205.13267v1
- Date: Thu, 26 May 2022 10:49:43 GMT
- Title: Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing
- Authors: Zhili Liu, Jianhua Han, Lanqing Hong, Hang Xu, Kai Chen, Chunjing Xu,
Zhenguo Li
- Abstract summary: A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
- Score: 76.78772372631623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL), especially contrastive methods, has raised
attraction recently as it learns effective transferable representations without
semantic annotations. A common practice for self-supervised pre-training is to
use as much data as possible. For a specific downstream task, however,
involving irrelevant data in pre-training may degenerate the downstream
performance, observed from our extensive experiments. On the other hand, for
existing SSL methods, it is burdensome and infeasible to use different
downstream-task-customized datasets in pre-training for different tasks. To
address this issue, we propose a novel SSL paradigm called Scalable Dynamic
Routing (SDR), which can be trained once and deployed efficiently to different
downstream tasks with task-customized pre-trained models. Specifically, we
construct the SDRnet with various sub-nets and train each sub-net with only one
subset of the data by data-aware progressive training. When a downstream task
arrives, we route among all the pre-trained sub-nets to get the best along with
its corresponding weights. Experiment results show that our SDR can train 256
sub-nets on ImageNet simultaneously, which provides better transfer performance
than a unified model trained on the full ImageNet, achieving state-of-the-art
(SOTA) averaged accuracy over 11 downstream classification tasks and AP on
PASCAL VOC detection task.
Related papers
- Self-supervised visual learning in the low-data regime: a comparative evaluation [40.27083924454058]
Self-Supervised Learning (SSL) is a robust training methodology for contemporary Deep Neural Networks (DNNs)
This work introduces a taxonomy of modern visual SSL methods, accompanied by detailed explanations and insights regarding the main categories of approaches.
For domain-specific downstream tasks, in-domain low-data SSL pretraining outperforms the common approach of large-scale pretraining.
arXiv Detail & Related papers (2024-04-26T07:23:14Z) - Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
Learning in Speech and Audio [19.865050806327147]
Self-supervised learning has proven vital in speech and audio-related applications.
This paper provides the first empirical study of SSL pre-training for different specified sequence lengths.
We find that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks.
arXiv Detail & Related papers (2022-09-30T16:35:42Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Consecutive Pretraining: A Knowledge Transfer Learning Strategy with
Relevant Unlabeled Data for Remote Sensing Domain [25.84756140221655]
ConSecutive PreTraining (CSPT) is proposed based on the idea of not stopping pretraining in natural language processing (NLP)
The proposed CSPT also can release the huge potential of unlabeled data for task-aware model training.
The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in RSD can outperform the previous method of supervised pretraining-then-fine-tuning.
arXiv Detail & Related papers (2022-07-08T12:32:09Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Task2Sim : Towards Effective Pre-training and Transfer from Synthetic
Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks.
We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters.
It learns this mapping by training to find the set of best parameters on a set of "seen" tasks.
Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z) - Rethinking supervised pre-training for better downstream transferring [46.09030708111374]
We propose a new supervised pre-training method based on Leave-One-Out K-Nearest-Neighbor, or LOOK.
It relieves the problem of overfitting upstream tasks by only requiring each image to share its class label with most of its k nearest neighbors.
We developed efficient implementation of the proposed method that scales well to large datasets.
arXiv Detail & Related papers (2021-10-12T13:57:38Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Robust Transfer Learning with Pretrained Language Models through
Adapters [40.45102278979193]
Transfer learning with large pretrained language models like BERT has become a dominating approach for most NLP tasks.
We propose a simple yet effective adapter-based approach to mitigate these issues.
Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks.
arXiv Detail & Related papers (2021-08-05T02:30:13Z) - How Well Self-Supervised Pre-Training Performs with Streaming Data? [73.5362286533602]
In real-world scenarios where data are collected in a streaming fashion, the joint training scheme is usually storage-heavy and time-consuming.
It is unclear how well sequential self-supervised pre-training performs with streaming data.
We find sequential self-supervised learning exhibits almost the same performance as the joint training when the distribution shifts within streaming data are mild.
arXiv Detail & Related papers (2021-04-25T06:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.