Generalization Through The Lens Of Leave-One-Out Error
- URL: http://arxiv.org/abs/2203.03443v1
- Date: Mon, 7 Mar 2022 14:56:00 GMT
- Title: Generalization Through The Lens Of Leave-One-Out Error
- Authors: Gregor Bachmann, Thomas Hofmann, Aur\'elien Lucchi
- Abstract summary: We show that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
Our work therefore demonstrates that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
- Score: 22.188535244056016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the tremendous empirical success of deep learning models to solve
various learning tasks, our theoretical understanding of their generalization
ability is very limited. Classical generalization bounds based on tools such as
the VC dimension or Rademacher complexity, are so far unsuitable for deep
models and it is doubtful that these techniques can yield tight bounds even in
the most idealistic settings (Nagarajan & Kolter, 2019). In this work, we
instead revisit the concept of leave-one-out (LOO) error to measure the
generalization ability of deep models in the so-called kernel regime. While
popular in statistics, the LOO error has been largely overlooked in the context
of deep learning. By building upon the recently established connection between
neural networks and kernel learning, we leverage the closed-form expression for
the leave-one-out error, giving us access to an efficient proxy for the test
error. We show both theoretically and empirically that the leave-one-out error
is capable of capturing various phenomena in generalization theory, such as
double descent, random labels or transfer learning. Our work therefore
demonstrates that the leave-one-out error provides a tractable way to estimate
the generalization ability of deep neural networks in the kernel regime,
opening the door to potential, new research directions in the field of
generalization.
Related papers
- PAC-Bayes Compression Bounds So Tight That They Can Explain
Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace.
We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z) - Learning Non-Vacuous Generalization Bounds from Optimization [8.294831479902658]
We present a simple yet non-vacuous generalization bound from the optimization perspective.
We achieve this goal by leveraging that the hypothesis set accessed by gradient algorithms is essentially fractal-like.
Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks.
arXiv Detail & Related papers (2022-06-09T08:59:46Z) - Why Robust Generalization in Deep Learning is Difficult: Perspective of
Expressive Power [15.210336733607488]
We show that for binary classification problems with well-separated data, there exists a constant robust generalization gap unless the size of the neural network is exponential.
We establish an improved upper bound of $exp(mathcalO(k))$ for the network size to achieve low robust generalization error.
arXiv Detail & Related papers (2022-05-27T09:53:04Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - Generalization by design: Shortcuts to Generalization in Deep Learning [7.751691910877239]
We show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer.
Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization may be encoded into the structure of the network.
arXiv Detail & Related papers (2021-07-05T20:01:23Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk.
When evaluated empirically, most of these bounds are numerically vacuous.
We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Spectral Bias and Task-Model Alignment Explain Generalization in Kernel
Regression and Infinitely Wide Neural Networks [17.188280334580195]
Generalization beyond a training dataset is a main goal of machine learning.
Recent observations in deep neural networks contradict conventional wisdom from classical statistics.
We show that more data may impair generalization when noisy or not expressible by the kernel.
arXiv Detail & Related papers (2020-06-23T17:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.