Generalization Error Bounds via $m$th Central Moments of the Information
Density
- URL: http://arxiv.org/abs/2004.09148v2
- Date: Wed, 9 Sep 2020 09:11:09 GMT
- Title: Generalization Error Bounds via $m$th Central Moments of the Information
Density
- Authors: Fredrik Hellstr\"om, Giuseppe Durisi
- Abstract summary: We present a general approach to deriving bounds on the generalization error of randomized learning algorithms.
Our approach can be used to obtain bounds on the average generalization error as well as bounds on its tail probabilities.
- Score: 14.147617330278662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a general approach to deriving bounds on the generalization error
of randomized learning algorithms. Our approach can be used to obtain bounds on
the average generalization error as well as bounds on its tail probabilities,
both for the case in which a new hypothesis is randomly generated every time
the algorithm is used - as often assumed in the probably approximately correct
(PAC)-Bayesian literature - and in the single-draw case, where the hypothesis
is extracted only once. For this last scenario, we present a novel bound that
is explicit in the central moments of the information density. The bound
reveals that the higher the order of the information density moment that can be
controlled, the milder the dependence of the generalization bound on the
desired confidence level. Furthermore, we use tools from binary hypothesis
testing to derive a second bound, which is explicit in the tail of the
information density. This bound confirms that a fast decay of the tail of the
information density yields a more favorable dependence of the generalization
bound on the confidence level.
Related papers
- Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Bayesian Renormalization [68.8204255655161]
We present a fully information theoretic approach to renormalization inspired by Bayesian statistical inference.
The main insight of Bayesian Renormalization is that the Fisher metric defines a correlation length that plays the role of an emergent RG scale.
We provide insight into how the Bayesian Renormalization scheme relates to existing methods for data compression and data generation.
arXiv Detail & Related papers (2023-05-17T18:00:28Z) - Understanding the Generalization Ability of Deep Learning Algorithms: A
Kernelized Renyi's Entropy Perspective [11.255943520955764]
We propose a novel information theoretical measure: kernelized Renyi's entropy.
We establish the generalization error bounds for gradient/Langevin descent (SGD/SGLD) learning algorithms under kernelized Renyi's entropy.
We show that our information-theoretical bounds depend on the statistics of the gradients, and are rigorously tighter than the current state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-05-02T01:17:15Z) - Fast Rate Information-theoretic Bounds on Generalization Errors [8.102199960821165]
The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data.
Various information-theoretic bounds on the generalization error have been derived in the literature.
This paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size.
arXiv Detail & Related papers (2023-03-26T08:59:05Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Fast Rate Generalization Error Bounds: Variations on a Theme [5.081241420920605]
We show that the convergence rate of the expected generalization error is in the form of O(sqrtlambda/n)
We identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition.
Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(lambda/n) for specific learning algorithms.
arXiv Detail & Related papers (2022-05-06T10:39:48Z) - Chained Generalisation Bounds [26.043342234937747]
We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.
We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
arXiv Detail & Related papers (2022-03-02T09:34:36Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Super fast rates in structured prediction [88.99819200562784]
We show how we can leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value.
We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction.
We then consider kernel ridge regression where we improve known rates in $n-1/4$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem.
arXiv Detail & Related papers (2021-02-01T10:50:04Z) - Generalization Bounds via Information Density and Conditional
Information Density [14.147617330278662]
We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms.
We provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios.
arXiv Detail & Related papers (2020-05-16T17:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.