VC Theoretical Explanation of Double Descent
- URL: http://arxiv.org/abs/2205.15549v1
- Date: Tue, 31 May 2022 05:50:02 GMT
- Title: VC Theoretical Explanation of Double Descent
- Authors: Eng Hock Lee and Vladimir Cherkassky
- Abstract summary: This paper presents VC-theoretical analysis of double descent and shows that it can be fully explained by classical VC generalization bounds.
We illustrate an application of analytic VC-bounds for modeling double descent for classification problems, using empirical results for several learning methods.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been growing interest in generalization performance of large
multilayer neural networks that can be trained to achieve zero training error,
while generalizing well on test data. This regime is known as 'second descent'
and it appears to contradict conventional view that optimal model complexity
should reflect optimal balance between underfitting and overfitting, aka the
bias-variance trade-off. This paper presents VC-theoretical analysis of double
descent and shows that it can be fully explained by classical VC generalization
bounds. We illustrate an application of analytic VC-bounds for modeling double
descent for classification problems, using empirical results for several
learning methods, such as SVM, Least Squares, and Multilayer Perceptron
classifiers. In addition, we discuss several possible reasons for
misinterpretation of VC-theoretical results in the machine learning community.
Related papers
- Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon.
By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting.
section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z) - Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class.
We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift [14.641747166801133]
multimodal contrastive learning approaches, such as CLIP, have achieved a remarkable success in learning representations that are robust against distribution shift.
We identify two mechanisms behind MMCL's robustness: emphintra-class contrasting and emphinter-class feature sharing.
We theoretically demonstrate the benefits of using rich captions on robustness and explore the effect of annotating different types of details in the captions.
arXiv Detail & Related papers (2023-10-08T02:25:52Z) - A Theoretical Understanding of Shallow Vision Transformers: Learning,
Generalization, and Sample Complexity [71.11795737362459]
ViTs with self-attention modules have recently achieved great empirical success in many tasks.
However, theoretical learning generalization analysis is mostly noisy and elusive.
This paper provides the first theoretical analysis of a shallow ViT for a classification task.
arXiv Detail & Related papers (2023-02-12T22:12:35Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z) - Optimal Regularization Can Mitigate Double Descent [29.414119906479954]
We study whether the double-descent phenomenon can be avoided by using optimal regularization.
We demonstrate empirically that optimally-tuned $ell$ regularization can double descent for more general models, including neural networks.
arXiv Detail & Related papers (2020-03-04T05:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.