The Role of Pre-training Data in Transfer Learning
- URL: http://arxiv.org/abs/2302.13602v2
- Date: Wed, 1 Mar 2023 13:48:55 GMT
- Title: The Role of Pre-training Data in Transfer Learning
- Authors: Rahim Entezari, Mitchell Wortsman, Olga Saukh, M.Moein Shariatnia,
Hanie Sedghi, Ludwig Schmidt
- Abstract summary: We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance.
We find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning.
- Score: 20.768366728182997
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The transfer learning paradigm of model pre-training and subsequent
fine-tuning produces high-accuracy models. While most studies recommend scaling
the pre-training size to benefit most from transfer learning, a question
remains: what data and method should be used for pre-training? We investigate
the impact of pre-training data distribution on the few-shot and full
fine-tuning performance using 3 pre-training methods (supervised, contrastive
language-image and image-image), 7 pre-training datasets, and 9 downstream
datasets. Through extensive controlled experiments, we find that the choice of
the pre-training data source is essential for the few-shot transfer, but its
role decreases as more data is made available for fine-tuning. Additionally, we
explore the role of data curation and examine the trade-offs between label
noise and the size of the pre-training dataset. We find that using 2000X more
pre-training data from LAION can match the performance of supervised ImageNet
pre-training. Furthermore, we investigate the effect of pre-training methods,
comparing language-image contrastive vs. image-image contrastive, and find that
the latter leads to better downstream accuracy
Related papers
- Enhancing pretraining efficiency for medical image segmentation via transferability metrics [0.0]
In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge.
We introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data.
arXiv Detail & Related papers (2024-10-24T12:11:52Z) - Better with Less: A Data-Active Perspective on Pre-Training Graph Neural
Networks [39.71761440499148]
Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data.
We propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model.
Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
arXiv Detail & Related papers (2023-11-02T07:09:59Z) - On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness [66.30369048726145]
We find that the primary factor influencing downstream effective robustness is data quantity.
We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
arXiv Detail & Related papers (2023-07-24T05:36:19Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Effect of large-scale pre-training on full and few-shot transfer
learning for natural and medical images [2.030567625639093]
We conduct large-scale pre-training on large source datasets of either natural (ImageNet-21k/1k) or medical chest X-Ray images.
We compare full and few-shot transfer using different target datasets from both natural and medical imaging domains.
Our observations provide evidence that while pre-training and transfer on closely related datasets do show clear benefit of increasing model and data size during pre-training, such benefits are not clearly visible when source and target datasets are further apart.
arXiv Detail & Related papers (2021-05-31T21:55:56Z) - Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation.
This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model.
We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z) - Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.