Whitening and second order optimization both make information in the
dataset unusable during training, and can reduce or prevent generalization
- URL: http://arxiv.org/abs/2008.07545v4
- Date: Mon, 19 Jul 2021 07:00:41 GMT
- Title: Whitening and second order optimization both make information in the
dataset unusable during training, and can reduce or prevent generalization
- Authors: Neha S. Wadia, Daniel Duckworth, Samuel S. Schoenholz, Ethan Dyer and
Jascha Sohl-Dickstein
- Abstract summary: We show that both data whitening and second order optimization can harm or entirely prevent generalization.
For a general class of models, namely models with a fully connected first layer, we prove that the information contained in this matrix is the only information which can be used to generalize.
- Score: 50.53690793828442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning is predicated on the concept of generalization: a model
achieving low error on a sufficiently large training set should also perform
well on novel samples from the same distribution. We show that both data
whitening and second order optimization can harm or entirely prevent
generalization. In general, model training harnesses information contained in
the sample-sample second moment matrix of a dataset. For a general class of
models, namely models with a fully connected first layer, we prove that the
information contained in this matrix is the only information which can be used
to generalize. Models trained using whitened data, or with certain second order
optimization schemes, have less access to this information, resulting in
reduced or nonexistent generalization ability. We experimentally verify these
predictions for several architectures, and further demonstrate that
generalization continues to be harmed even when theoretical requirements are
relaxed. However, we also show experimentally that regularized second order
optimization can provide a practical tradeoff, where training is accelerated
but less information is lost, and generalization can in some circumstances even
improve.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs)
We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z) - LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views [28.081794908107604]
Fine-tuning is used to leverage the power of pre-trained foundation models in new downstream tasks.
Recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions.
We propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model.
arXiv Detail & Related papers (2024-02-07T08:16:40Z) - Toward Theoretical Guidance for Two Common Questions in Practical
Cross-Validation based Hyperparameter Selection [72.76113104079678]
We show the first theoretical treatments of two common questions in cross-validation based hyperparameter selection.
We show that these generalizations can, respectively, always perform at least as well as always performing retraining or never performing retraining.
arXiv Detail & Related papers (2023-01-12T16:37:12Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Bi-tuning of Pre-trained Representations [79.58542780707441]
Bi-tuning is a general learning framework to fine-tune both supervised and unsupervised pre-trained representations to downstream tasks.
Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations.
Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
arXiv Detail & Related papers (2020-11-12T03:32:25Z) - The Curious Case of Adversarially Robust Models: More Data Can Help,
Double Descend, or Hurt Generalization [36.87923859576768]
Adversarial training has shown its ability in producing models that are robust to perturbations on the input data, but usually at the expense of decrease in the standard accuracy.
In this paper, we show that more training data can hurt the generalization of adversarially robust models in the classification problems.
arXiv Detail & Related papers (2020-02-25T18:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.