On the Generalization Ability of Unsupervised Pretraining
- URL: http://arxiv.org/abs/2403.06871v1
- Date: Mon, 11 Mar 2024 16:23:42 GMT
- Title: On the Generalization Ability of Unsupervised Pretraining
- Authors: Yuyang Deng, Junyuan Hong, Jiayu Zhou, Mehrdad Mahdavi
- Abstract summary: Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
- Score: 53.06175754026037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in unsupervised learning have shown that unsupervised
pre-training, followed by fine-tuning, can improve model generalization.
However, a rigorous understanding of how the representation function learned on
an unlabeled dataset affects the generalization of the fine-tuned model is
lacking. Existing theoretical research does not adequately account for the
heterogeneity of the distribution and tasks in pre-training and fine-tuning
stage. To bridge this gap, this paper introduces a novel theoretical framework
that illuminates the critical factor influencing the transferability of
knowledge acquired during unsupervised pre-training to the subsequent
fine-tuning phase, ultimately affecting the generalization capabilities of the
fine-tuned model on downstream tasks. We apply our theoretical framework to
analyze generalization bound of two distinct scenarios: Context Encoder
pre-training with deep neural networks and Masked Autoencoder pre-training with
deep transformers, followed by fine-tuning on a binary classification task.
Finally, inspired by our findings, we propose a novel regularization method
during pre-training to further enhances the generalization of fine-tuned model.
Overall, our results contribute to a better understanding of unsupervised
pre-training and fine-tuning paradigm, and can shed light on the design of more
effective pre-training algorithms.
Related papers
- HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs)
We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Learning Expressive Priors for Generalization and Uncertainty Estimation
in Neural Networks [77.89179552509887]
We propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks.
The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees.
We exhaustively show the effectiveness of this method for uncertainty estimation and generalization.
arXiv Detail & Related papers (2023-07-15T09:24:33Z) - A Bayesian approach to quantifying uncertainties and improving
generalizability in traffic prediction models [0.0]
We propose a Bayesian recurrent neural network framework for uncertainty in traffic prediction with higher generalizability.
We show that normalization alters the training process of deep neural networks by controlling the model's complexity.
Our findings are especially relevant to traffic management applications, where predicting traffic conditions across multiple locations is the goal.
arXiv Detail & Related papers (2023-07-12T06:23:31Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Bi-tuning of Pre-trained Representations [79.58542780707441]
Bi-tuning is a general learning framework to fine-tune both supervised and unsupervised pre-trained representations to downstream tasks.
Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations.
Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
arXiv Detail & Related papers (2020-11-12T03:32:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.