A unified framework for information-theoretic generalization bounds
- URL: http://arxiv.org/abs/2305.11042v2
- Date: Wed, 6 Dec 2023 18:59:16 GMT
- Title: A unified framework for information-theoretic generalization bounds
- Authors: Yifeng Chu and Maxim Raginsky
- Abstract summary: This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
- Score: 8.04975023021212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a general methodology for deriving information-theoretic
generalization bounds for learning algorithms. The main technical tool is a
probabilistic decorrelation lemma based on a change of measure and a relaxation
of Young's inequality in $L_{\psi_p}$ Orlicz spaces. Using the decorrelation
lemma in combination with other techniques, such as symmetrization, couplings,
and chaining in the space of probability measures, we obtain new upper bounds
on the generalization error, both in expectation and in high probability, and
recover as special cases many of the existing generalization bounds, including
the ones based on mutual information, conditional mutual information,
stochastic chaining, and PAC-Bayes inequalities. In addition, the
Fernique-Talagrand upper bound on the expected supremum of a subgaussian
process emerges as a special case.
Related papers
- Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [61.580419063416734]
A recent stream of structured learning approaches has improved the practical state of the art for a range of optimization problems.
The key idea is to exploit the statistical distribution over instances instead of dealing with instances separately.
In this article, we investigate methods that smooth the risk by perturbing the policy, which eases optimization and improves the generalization error.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization
Bounds with Complexity Measures [14.408127155170774]
We leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures.
Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap.
arXiv Detail & Related papers (2024-02-19T10:15:11Z) - On Regularization and Inference with Label Constraints [62.60903248392479]
We compare two strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference.
For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints.
For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage.
arXiv Detail & Related papers (2023-07-08T03:39:22Z) - Generalization Guarantees via Algorithm-dependent Rademacher Complexity [33.408679482592426]
We propose a measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class.
We obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term.
arXiv Detail & Related papers (2023-07-04T18:37:38Z) - Information Theoretic Lower Bounds for Information Theoretic Upper
Bounds [14.268363583731848]
We examine the relationship between the output model and the empirical generalization and the algorithm in the context of convex optimization.
Our study reveals that, for true risk minimization, mutual information is necessary.
Existing information-theoretic generalization bounds fall short in capturing the capabilities of algorithms like SGD and regularized, which have-independent dimension sample complexity.
arXiv Detail & Related papers (2023-02-09T20:42:36Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Chained Generalisation Bounds [26.043342234937747]
We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.
We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
arXiv Detail & Related papers (2022-03-02T09:34:36Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - Robust Generalization via $\alpha$-Mutual Information [24.40306100502023]
bounds connecting two probability measures of the same event using R'enyi $alpha$-Divergences and Sibson's $alpha$-Mutual Information.
Results have broad applications, from bounding the generalization error of learning algorithms to the more general framework of adaptive data analysis.
arXiv Detail & Related papers (2020-01-14T11:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.