Efficient Conditional Pre-training for Transfer Learning
- URL: http://arxiv.org/abs/2011.10231v5
- Date: Thu, 18 Nov 2021 20:17:10 GMT
- Title: Efficient Conditional Pre-training for Transfer Learning
- Authors: Shuvam Chakraborty, Burak Uzkent, Kumar Ayush, Kumar Tanmay, Evan
Sheehan, Stefano Ermon
- Abstract summary: We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
- Score: 71.01129334495553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Almost all the state-of-the-art neural networks for computer vision tasks are
trained by (1) pre-training on a large-scale dataset and (2) finetuning on the
target dataset. This strategy helps reduce dependence on the target dataset and
improves convergence rate and generalization on the target task. Although
pre-training on large-scale datasets is very useful, its foremost disadvantage
is high training cost. To address this, we propose efficient filtering methods
to select relevant subsets from the pre-training dataset. Additionally, we
discover that lowering image resolutions in the pre-training step offers a
great trade-off between cost and performance. We validate our techniques by
pre-training on ImageNet in both the unsupervised and supervised settings and
finetuning on a diverse collection of target datasets and tasks. Our proposed
methods drastically reduce pre-training cost and provide strong performance
boosts. Finally, we improve standard ImageNet pre-training by 1-3% by tuning
available models on our subsets and pre-training on a dataset filtered from a
larger scale dataset.
Related papers
- Data Filtering Networks [67.827994353269]
We study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset.
Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks.
Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets.
arXiv Detail & Related papers (2023-09-29T17:37:29Z) - The Role of Pre-training Data in Transfer Learning [20.768366728182997]
We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance.
We find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning.
arXiv Detail & Related papers (2023-02-27T09:10:08Z) - SEPT: Towards Scalable and Efficient Visual Pre-Training [11.345844145289524]
Self-supervised pre-training has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance.
We build a task-specific self-supervised pre-training framework based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
arXiv Detail & Related papers (2022-12-11T11:02:11Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Are Large-scale Datasets Necessary for Self-Supervised Pre-training? [29.49873710927313]
We consider a self-supervised pre-training scenario that only leverages the target task data.
Our study shows that denoising autoencoders, such as BEiT, are more robust to the type and size of the pre-training data.
On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.
arXiv Detail & Related papers (2021-12-20T18:41:32Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Rethinking Pre-training and Self-training [105.27954735761678]
We investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training.
Our study reveals the generality and flexibility of self-training with three additional insights.
For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
arXiv Detail & Related papers (2020-06-11T23:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.