Rethinking Pre-training and Self-training
- URL: http://arxiv.org/abs/2006.06882v2
- Date: Sun, 15 Nov 2020 19:41:27 GMT
- Title: Rethinking Pre-training and Self-training
- Authors: Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin
D. Cubuk, Quoc V. Le
- Abstract summary: We investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training.
Our study reveals the generality and flexibility of self-training with three additional insights.
For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
- Score: 105.27954735761678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training is a dominant paradigm in computer vision. For example,
supervised ImageNet pre-training is commonly used to initialize the backbones
of object detection and segmentation models. He et al., however, show a
surprising result that ImageNet pre-training has limited impact on COCO object
detection. Here we investigate self-training as another method to utilize
additional data on the same setup and contrast it against ImageNet
pre-training. Our study reveals the generality and flexibility of self-training
with three additional insights: 1) stronger data augmentation and more labeled
data further diminish the value of pre-training, 2) unlike pre-training,
self-training is always helpful when using stronger data augmentation, in both
low-data and high-data regimes, and 3) in the case that pre-training is
helpful, self-training improves upon pre-training. For example, on the COCO
object detection dataset, pre-training benefits when we use one fifth of the
labeled data, and hurts accuracy when we use all labeled data. Self-training,
on the other hand, shows positive improvements from +1.3 to +3.4AP across all
dataset sizes. In other words, self-training works well exactly on the same
setup that pre-training does not work (using ImageNet to help COCO). On the
PASCAL segmentation dataset, which is a much smaller dataset than COCO, though
pre-training does help significantly, self-training improves upon the
pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement
of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve
90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art
result by DeepLabv3+.
Related papers
- Better with Less: A Data-Active Perspective on Pre-Training Graph Neural
Networks [39.71761440499148]
Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data.
We propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model.
Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
arXiv Detail & Related papers (2023-11-02T07:09:59Z) - The Role of Pre-training Data in Transfer Learning [20.768366728182997]
We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance.
We find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning.
arXiv Detail & Related papers (2023-02-27T09:10:08Z) - Downstream Datasets Make Surprisingly Good Pretraining Corpora [39.77171117174906]
This paper introduces a large-scale study of self-pretraining, where the same (downstream) training data is used for both pretraining and finetuning.
In experiments addressing both ELECTRA and RoBERTa models and 10 distinct downstream classification datasets, we observe that self-pretraining rivals standard pretraining on the BookWiki corpus.
Our results hint that performance gains attributable to pretraining are driven primarily by the pretraining objective itself and are not always attributable to the use of external pretraining data in massive amounts.
arXiv Detail & Related papers (2022-09-28T19:28:43Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Are Large-scale Datasets Necessary for Self-Supervised Pre-training? [29.49873710927313]
We consider a self-supervised pre-training scenario that only leverages the target task data.
Our study shows that denoising autoencoders, such as BEiT, are more robust to the type and size of the pre-training data.
On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.
arXiv Detail & Related papers (2021-12-20T18:41:32Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z) - Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection [86.0580214485104]
We propose a general and efficient pre-training paradigm, Montage pre-training, for object detection.
Montage pre-training needs only the target detection dataset while taking only 1/4 computational resources compared to the widely adopted ImageNet pre-training.
The efficiency and effectiveness of Montage pre-training are validated by extensive experiments on the MS-COCO dataset.
arXiv Detail & Related papers (2020-04-25T16:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.