On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness
- URL: http://arxiv.org/abs/2307.12532v1
- Date: Mon, 24 Jul 2023 05:36:19 GMT
- Title: On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness
- Authors: Vivek Ramanujan, Thao Nguyen, Sewoong Oh, Ludwig Schmidt, Ali Farhadi
- Abstract summary: We find that the primary factor influencing downstream effective robustness is data quantity.
We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
- Score: 66.30369048726145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-training has been widely adopted in deep learning to improve model
performance, especially when the training data for a target task is limited. In
our work, we seek to understand the implications of this training strategy on
the generalization properties of downstream models. More specifically, we ask
the following question: how do properties of the pre-training distribution
affect the robustness of a fine-tuned model? The properties we explore include
the label space, label semantics, image diversity, data domains, and data
quantity of the pre-training distribution. We find that the primary factor
influencing downstream effective robustness (Taori et al., 2020) is data
quantity, while other factors have limited significance. For example, reducing
the number of ImageNet pre-training classes by 4x while increasing the number
of images per class by 4x (that is, keeping total data quantity fixed) does not
impact the robustness of fine-tuned models. We demonstrate our findings on
pre-training distributions drawn from various natural and synthetic data
sources, primarily using the iWildCam-WILDS distribution shift as a test for
downstream robustness.
Related papers
- Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.
We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.
Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - Better with Less: A Data-Active Perspective on Pre-Training Graph Neural
Networks [39.71761440499148]
Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data.
We propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model.
Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
arXiv Detail & Related papers (2023-11-02T07:09:59Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - A Pretrainer's Guide to Training Data: Measuring the Effects of Data
Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining.
Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - The Role of Pre-training Data in Transfer Learning [20.768366728182997]
We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance.
We find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning.
arXiv Detail & Related papers (2023-02-27T09:10:08Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Effect of large-scale pre-training on full and few-shot transfer
learning for natural and medical images [2.030567625639093]
We conduct large-scale pre-training on large source datasets of either natural (ImageNet-21k/1k) or medical chest X-Ray images.
We compare full and few-shot transfer using different target datasets from both natural and medical imaging domains.
Our observations provide evidence that while pre-training and transfer on closely related datasets do show clear benefit of increasing model and data size during pre-training, such benefits are not clearly visible when source and target datasets are further apart.
arXiv Detail & Related papers (2021-05-31T21:55:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.