Characterizing the Generalization Error of Gibbs Algorithm with
Symmetrized KL information
- URL: http://arxiv.org/abs/2107.13656v1
- Date: Wed, 28 Jul 2021 22:20:34 GMT
- Title: Characterizing the Generalization Error of Gibbs Algorithm with
Symmetrized KL information
- Authors: Gholamali Aminian, Yuheng Bu, Laura Toni, Miguel R. D. Rodrigues and
Gregory Wornell
- Abstract summary: Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory.
Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm.
- Score: 18.92529916180208
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Bounding the generalization error of a supervised learning algorithm is one
of the most important problems in learning theory, and various approaches have
been developed. However, existing bounds are often loose and lack of
guarantees. As a result, they may fail to characterize the exact generalization
ability of a learning algorithm. Our main contribution is an exact
characterization of the expected generalization error of the well-known Gibbs
algorithm in terms of symmetrized KL information between the input training
samples and the output hypothesis. Such a result can be applied to tighten
existing expected generalization error bound. Our analysis provides more
insight on the fundamental role the symmetrized KL information plays in
controlling the generalization error of the Gibbs algorithm.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class.
We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z) - Generalization Analysis of Machine Learning Algorithms via the
Worst-Case Data-Generating Probability Measure [1.773764539873123]
Worst-case probability measure over the data is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms.
Fundamental generalization metrics, such as the sensitivity of the expected loss, the sensitivity of empirical risk, and the generalization gap are shown to have closed-form expressions.
A novel parallel is established between the worst-case data-generating probability measure and the Gibbs algorithm.
arXiv Detail & Related papers (2023-12-19T15:20:27Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Characterizing and Understanding the Generalization Error of Transfer
Learning with Gibbs Algorithm [10.851348154870854]
We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms.
We focus on two popular transfer learning approaches, $alpha$-weightedERM and two-stage-ERM.
arXiv Detail & Related papers (2021-11-02T14:49:48Z) - Information-theoretic generalization bounds for black-box learning
algorithms [46.44597430985965]
We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm.
We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
arXiv Detail & Related papers (2021-10-04T17:28:41Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Information-Theoretic Bounds on the Moments of the Generalization Error
of Learning Algorithms [19.186110989897738]
Generalization error bounds are critical to understanding the performance of machine learning models.
We offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments.
arXiv Detail & Related papers (2021-02-03T11:38:00Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z) - Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.