Fantastic Generalization Measures are Nowhere to be Found
- URL: http://arxiv.org/abs/2309.13658v3
- Date: Tue, 28 Nov 2023 16:47:56 GMT
- Title: Fantastic Generalization Measures are Nowhere to be Found
- Authors: Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger
- Abstract summary: We study the notion of a generalization bound being uniformly tight, meaning that the difference between the bound and the population loss is small.
Numerous generalization bounds have been proposed in the literature as potential explanations for the ability of neural networks to generalize.
- Score: 14.599761709255917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the notion of a generalization bound being uniformly tight, meaning
that the difference between the bound and the population loss is small for all
learning algorithms and all population distributions. Numerous generalization
bounds have been proposed in the literature as potential explanations for the
ability of neural networks to generalize in the overparameterized setting.
However, in their paper ``Fantastic Generalization Measures and Where to Find
Them,'' Jiang et al. (2020) examine more than a dozen generalization bounds,
and show empirically that none of them are uniformly tight. This raises the
question of whether uniformly-tight generalization bounds are at all possible
in the overparameterized setting. We consider two types of generalization
bounds: (1) bounds that may depend on the training set and the learned
hypothesis (e.g., margin bounds). We prove mathematically that no such bound
can be uniformly tight in the overparameterized setting; (2) bounds that may in
addition also depend on the learning algorithm (e.g., stability bounds). For
these bounds, we show a trade-off between the algorithm's performance and the
bound's tightness. Namely, if the algorithm achieves good accuracy on certain
distributions, then no generalization bound can be uniformly tight for it in
the overparameterized setting. We explain how these formal results can, in our
view, inform research on generalization bounds for neural networks, while
stressing that other interpretations of these results are also possible.
Related papers
- Which Algorithms Have Tight Generalization Bounds? [13.364505088105508]
We study which machine learning algorithms have tight generalization bounds.
We show that algorithms that have certain inductive biases that cause them to be unstable do not admit tight generalization bounds.
We conclude with a simple characterization that relates the existence of tight generalization bounds to the conditional variance of the algorithm's loss.
arXiv Detail & Related papers (2024-10-02T19:24:57Z) - Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization
Bounds with Complexity Measures [14.408127155170774]
We leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures.
Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap.
arXiv Detail & Related papers (2024-02-19T10:15:11Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Chained Generalisation Bounds [26.043342234937747]
We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.
We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
arXiv Detail & Related papers (2022-03-02T09:34:36Z) - Explaining generalization in deep learning: progress and fundamental
limits [8.299945169799795]
In the first part of the thesis, we will empirically study how training deep networks via gradient descent implicitly controls the networks' capacity.
We will then derive em data-dependent em uniform-convergence-based generalization bounds with improved dependencies on the parameter count.
In the last part of the thesis, we will introduce an em empirical technique to estimate generalization using unlabeled data.
arXiv Detail & Related papers (2021-10-17T21:17:30Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - Robustness, Privacy, and Generalization of Adversarial Training [84.38148845727446]
This paper establishes and quantifies the privacy-robustness trade-off and generalization-robustness trade-off in adversarial training.
We show that adversarial training is $(varepsilon, delta)$-differentially private, where the magnitude of the differential privacy has a positive correlation with the robustified intensity.
Our generalization bounds do not explicitly rely on the parameter size which would be large in deep learning.
arXiv Detail & Related papers (2020-12-25T13:35:02Z) - Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals.
Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance.
We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z) - Tighter Generalization Bounds for Iterative Differentially Private
Learning Algorithms [95.73230376153872]
This paper studies the relationship between generalization and privacy preservation in iterative learning algorithms by two sequential steps.
We prove that $(varepsilon, delta)$-differential privacy implies an on-average generalization bound for multi-Database learning algorithms.
We then investigate how the iterative nature shared by most learning algorithms influence privacy preservation and further generalization.
arXiv Detail & Related papers (2020-07-18T09:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.