Information-Theoretic Bounds on the Moments of the Generalization Error
of Learning Algorithms
- URL: http://arxiv.org/abs/2102.02016v1
- Date: Wed, 3 Feb 2021 11:38:00 GMT
- Title: Information-Theoretic Bounds on the Moments of the Generalization Error
of Learning Algorithms
- Authors: Gholamali Aminian, Laura Toni, Miguel R. D. Rodrigues
- Abstract summary: Generalization error bounds are critical to understanding the performance of machine learning models.
We offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments.
- Score: 19.186110989897738
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generalization error bounds are critical to understanding the performance of
machine learning models. In this work, building upon a new bound of the
expected value of an arbitrary function of the population and empirical risk of
a learning algorithm, we offer a more refined analysis of the generalization
behaviour of a machine learning models based on a characterization of (bounds)
to their generalization error moments. We discuss how the proposed bounds --
which also encompass new bounds to the expected generalization error -- relate
to existing bounds in the literature. We also discuss how the proposed
generalization error moment bounds can be used to construct new generalization
error high-probability bounds.
Related papers
- Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class.
We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z) - Generalization error bounds for iterative learning algorithms with
bounded updates [41.87646434714452]
We introduce a novel bound for the generalization of these algorithms with bounded updates.
We employ a variance decomposition technique to decompose information across iterations.
To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models.
arXiv Detail & Related papers (2023-09-10T16:55:59Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Characterizing the Generalization Error of Gibbs Algorithm with
Symmetrized KL information [18.92529916180208]
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory.
Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm.
arXiv Detail & Related papers (2021-07-28T22:20:34Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Jensen-Shannon Information Based Characterization of the Generalization
Error of Learning Algorithms [17.682750327776777]
Generalization error bounds are critical to understanding the performance of machine learning models.
We propose a new information-theoretic based generalization error upper bound applicable to supervised learning scenarios.
arXiv Detail & Related papers (2020-10-23T20:53:07Z) - In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk.
When evaluated empirically, most of these bounds are numerically vacuous.
We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.