Chained Generalisation Bounds
- URL: http://arxiv.org/abs/2203.00977v1
- Date: Wed, 2 Mar 2022 09:34:36 GMT
- Title: Chained Generalisation Bounds
- Authors: Eugenio Clerico, Amitis Shidani, George Deligiannidis, Arnaud Doucet
- Abstract summary: We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.
We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
- Score: 26.043342234937747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work discusses how to derive upper bounds for the expected
generalisation error of supervised learning algorithms by means of the chaining
technique. By developing a general theoretical framework, we establish a
duality between generalisation bounds based on the regularity of the loss
function, and their chained counterparts, which can be obtained by lifting the
regularity assumption from the loss onto its gradient. This allows us to
re-derive the chaining mutual information bound from the literature, and to
obtain novel chained information-theoretic generalisation bounds, based on the
Wasserstein distance and other probability metrics. We show on some toy
examples that the chained generalisation bound can be significantly tighter
than its standard counterpart, particularly when the distribution of the
hypotheses selected by the algorithm is very concentrated.
Keywords: Generalisation bounds; Chaining; Information-theoretic bounds;
Mutual information; Wasserstein distance; PAC-Bayes.
Related papers
- Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization
Bounds with Complexity Measures [14.408127155170774]
We leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures.
Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap.
arXiv Detail & Related papers (2024-02-19T10:15:11Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Generalization Bounds via Convex Analysis [12.411844611718958]
We show that it is possible to replace the mutual information by any strongly convex function of the joint input-output distribution.
Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance.
arXiv Detail & Related papers (2022-02-10T12:30:45Z) - R\'enyi divergence inequalities via interpolation, with applications to
generalised entropic uncertainty relations [91.3755431537592]
We investigate quantum R'enyi entropic quantities, specifically those derived from'sandwiched' divergence.
We present R'enyi mutual information decomposition rules, a new approach to the R'enyi conditional entropy tripartite chain rules and a more general bipartite comparison.
arXiv Detail & Related papers (2021-06-19T04:06:23Z) - Beyond Tikhonov: Faster Learning with Self-Concordant Losses via
Iterative Regularization [120.31448970413298]
We extend the theory of Tikhonov regularization to generalized self concordant loss functions.
We show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme.
arXiv Detail & Related papers (2021-06-16T15:25:41Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z) - Generalization Bounds via Information Density and Conditional
Information Density [14.147617330278662]
We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms.
We provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios.
arXiv Detail & Related papers (2020-05-16T17:04:24Z) - Generalization Error Bounds via $m$th Central Moments of the Information
Density [14.147617330278662]
We present a general approach to deriving bounds on the generalization error of randomized learning algorithms.
Our approach can be used to obtain bounds on the average generalization error as well as bounds on its tail probabilities.
arXiv Detail & Related papers (2020-04-20T09:23:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.