Related papers: Generalization Through The Lens Of Leave-One-Out Error

Generalization Through The Lens Of Leave-One-Out Error

URL: http://arxiv.org/abs/2203.03443v1
Date: Mon, 7 Mar 2022 14:56:00 GMT
Title: Generalization Through The Lens Of Leave-One-Out Error
Authors: Gregor Bachmann, Thomas Hofmann, Aur\'elien Lucchi
Abstract summary: We show that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime. Our work therefore demonstrates that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime.
Score: 22.188535244056016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the tremendous empirical success of deep learning models to solve various learning tasks, our theoretical understanding of their generalization ability is very limited. Classical generalization bounds based on tools such as the VC dimension or Rademacher complexity, are so far unsuitable for deep models and it is doubtful that these techniques can yield tight bounds even in the most idealistic settings (Nagarajan & Kolter, 2019). In this work, we instead revisit the concept of leave-one-out (LOO) error to measure the generalization ability of deep models in the so-called kernel regime. While popular in statistics, the LOO error has been largely overlooked in the context of deep learning. By building upon the recently established connection between neural networks and kernel learning, we leverage the closed-form expression for the leave-one-out error, giving us access to an efficient proxy for the test error. We show both theoretically and empirically that the leave-one-out error is capable of capturing various phenomena in generalization theory, such as double descent, random labels or transfer learning. Our work therefore demonstrates that the leave-one-out error provides a tractable way to estimate the generalization ability of deep neural networks in the kernel regime, opening the door to potential, new research directions in the field of generalization.

Related papers

A Near Complete Nonasymptotic Generalization Theory For Multilayer Neural Networks: Beyond the Bias-Variance Tradeoff [57.25901375384457]
We propose a nonasymptotic generalization theory for multilayer neural networks with arbitrary Lipschitz activations and general Lipschitz loss functions. In particular, it doens't require the boundness of loss function, as commonly assumed in the literature. We show the near minimax optimality of our theory for multilayer ReLU networks for regression problems.
arXiv Detail & Related papers (2025-03-03T23:34:12Z)
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace. We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z)
Learning Non-Vacuous Generalization Bounds from Optimization [8.294831479902658]
We present a simple yet non-vacuous generalization bound from the optimization perspective. We achieve this goal by leveraging that the hypothesis set accessed by gradient algorithms is essentially fractal-like. Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks.
arXiv Detail & Related papers (2022-06-09T08:59:46Z)
Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power [15.210336733607488]
We show that for binary classification problems with well-separated data, there exists a constant robust generalization gap unless the size of the neural network is exponential. We establish an improved upper bound of $exp(mathcalO(k))$ for the network size to achieve low robust generalization error.
arXiv Detail & Related papers (2022-05-27T09:53:04Z)
Towards Size-Independent Generalization Bounds for Deep Operator Nets [0.28123958518740544]
This work aims to advance the theory of measuring out-of-sample error while training DeepONets. For a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved. We show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets.
arXiv Detail & Related papers (2022-05-23T14:45:34Z)
Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation. We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem. Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z)
Generalization by design: Shortcuts to Generalization in Deep Learning [7.751691910877239]
We show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer. Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization may be encoded into the structure of the network.
arXiv Detail & Related papers (2021-07-05T20:01:23Z)
Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions. Subfunctions have their own activation pattern, domain, and empirical error. Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk. When evaluated empirically, most of these bounds are numerically vacuous. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z)
Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks [17.188280334580195]
Generalization beyond a training dataset is a main goal of machine learning. Recent observations in deep neural networks contradict conventional wisdom from classical statistics. We show that more data may impair generalization when noisy or not expressible by the kernel.
arXiv Detail & Related papers (2020-06-23T17:53:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.