Fast Rate Generalization Error Bounds: Variations on a Theme
- URL: http://arxiv.org/abs/2205.03131v1
- Date: Fri, 6 May 2022 10:39:48 GMT
- Title: Fast Rate Generalization Error Bounds: Variations on a Theme
- Authors: Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu
- Abstract summary: We show that the convergence rate of the expected generalization error is in the form of O(sqrtlambda/n)
We identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition.
Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(lambda/n) for specific learning algorithms.
- Score: 5.081241420920605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent line of works, initiated by \cite{russo2016controlling} and
\cite{xu2017information}, has shown that the generalization error of a learning
algorithm can be upper bounded by information measures. In most of the relevant
works, the convergence rate of the expected generalization error is in the form
of O(\sqrt{\lambda/{n}}) where \lambda is some information-theoretic quantities
such as the mutual information between the data sample and the learned
hypothesis. However, such a learning rate is typically considered to be "slow",
compared to a "fast rate" of O(1/n) in many learning scenarios. In this work,
we first show that the square root does not necessarily imply a slow rate, and
a fast rate (O(1/n)) result can still be obtained using this bound under
appropriate assumptions. Furthermore, we identify the key conditions needed for
the fast rate generalization error, which we call the (\eta,c)-central
condition. Under this condition, we give information-theoretic bounds on the
generalization error and excess risk, with a convergence rate of O(\lambda/{n})
for specific learning algorithms such as empirical risk minimization. Finally,
analytical examples are given to show the effectiveness of the bounds.
Related papers
- Fast Rate Information-theoretic Bounds on Generalization Errors [8.102199960821165]
The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data.
Various information-theoretic bounds on the generalization error have been derived in the literature.
This paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size.
arXiv Detail & Related papers (2023-03-26T08:59:05Z) - Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms.
We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z) - Limitations of Information-Theoretic Generalization Bounds for Gradient
Descent Methods in Stochastic Convex Optimization [48.12845778927164]
We consider the prospect of establishing minimax rates for gradient descent in the setting of convex optimization.
We consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm.
Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
arXiv Detail & Related papers (2022-12-27T17:16:48Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Bregman divergence based em algorithm and its application to classical
and quantum rate distortion theory [61.12008553173672]
We address the minimization problem of the Bregman divergence between an exponential subfamily and a mixture subfamily in a Bregman divergence system.
We apply this algorithm to rate distortion and its variants including the quantum setting.
arXiv Detail & Related papers (2022-01-07T13:33:28Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Super fast rates in structured prediction [88.99819200562784]
We show how we can leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value.
We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction.
We then consider kernel ridge regression where we improve known rates in $n-1/4$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem.
arXiv Detail & Related papers (2021-02-01T10:50:04Z) - Failures of model-dependent generalization bounds for least-norm
interpolation [39.97534972432276]
We consider bounds on the generalization performance of the least-norm linear regressor.
For a variety of natural joint distributions on training examples, any valid generalization bound must sometimes be very loose.
arXiv Detail & Related papers (2020-10-16T16:30:05Z) - Generalization Error Bounds via $m$th Central Moments of the Information
Density [14.147617330278662]
We present a general approach to deriving bounds on the generalization error of randomized learning algorithms.
Our approach can be used to obtain bounds on the average generalization error as well as bounds on its tail probabilities.
arXiv Detail & Related papers (2020-04-20T09:23:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.