On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training
- URL: http://arxiv.org/abs/2305.12224v2
- Date: Fri, 1 Dec 2023 15:56:21 GMT
- Title: On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training
- Authors: Jieyu Zhang, Bohan Wang, Zhengyu Hu, Pang Wei Koh, Alexander Ratner
- Abstract summary: We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
- Score: 72.8087629914444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training datasets are critical for building state-of-the-art machine
learning models, motivating rigorous study on their impact on downstream tasks.
In this work, we study the impact of the trade-off between the intra-class
diversity (the number of samples per class) and the inter-class diversity (the
number of classes) of a supervised pre-training dataset. Empirically, we found
that with the size of the pre-training dataset fixed, the best downstream
performance comes with a balance on the intra-/inter-class diversity. To
understand the underlying mechanism, we show theoretically that the downstream
performance depends monotonically on both types of diversity. Notably, our
theory reveals that the optimal class-to-sample ratio (#classes / #samples per
class) is invariant to the size of the pre-training dataset, which motivates an
application of predicting the optimal number of pre-training classes. We
demonstrate the effectiveness of this application by an improvement of around 2
points on the downstream tasks when using ImageNet as the pre-training dataset.
Related papers
- Optimizing importance weighting in the presence of sub-population shifts [0.0]
A distribution shift between the training and test data can severely harm performance of machine learning models.
We argue that existing weightings for determining the weights are suboptimal, as they neglect the increase of the variance of the estimated model due to the finite sample size of the training data.
We propose a bi-level optimization procedure in which the weights and model parameters are optimized simultaneously.
arXiv Detail & Related papers (2024-10-18T09:21:10Z) - Efficient Transferability Assessment for Selection of Pre-trained Detectors [63.21514888618542]
This paper studies the efficient transferability assessment of pre-trained object detectors.
We build up a detector transferability benchmark which contains a large and diverse zoo of pre-trained detectors.
Experimental results demonstrate that our method outperforms other state-of-the-art approaches in assessing transferability.
arXiv Detail & Related papers (2024-03-14T14:23:23Z) - On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness [66.30369048726145]
We find that the primary factor influencing downstream effective robustness is data quantity.
We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
arXiv Detail & Related papers (2023-07-24T05:36:19Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.