On the tightness of information-theoretic bounds on generalization error
of learning algorithms
- URL: http://arxiv.org/abs/2303.14658v1
- Date: Sun, 26 Mar 2023 08:59:05 GMT
- Title: On the tightness of information-theoretic bounds on generalization error
of learning algorithms
- Authors: Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu
- Abstract summary: We first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound.
We identify the critical conditions needed for the fast rate generalization error, which we call the $(eta,c)$-central condition.
- Score: 5.081241420920605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent line of works, initiated by Russo and Xu, has shown that the
generalization error of a learning algorithm can be upper bounded by
information measures. In most of the relevant works, the convergence rate of
the expected generalization error is in the form of $O(\sqrt{\lambda/n})$ where
$\lambda$ is some information-theoretic quantities such as the mutual
information or conditional mutual information between the data and the learned
hypothesis. However, such a learning rate is typically considered to be
``slow", compared to a ``fast rate" of $O(\lambda/n)$ in many learning
scenarios. In this work, we first show that the square root does not
necessarily imply a slow rate, and a fast rate result can still be obtained
using this bound under appropriate assumptions. Furthermore, we identify the
critical conditions needed for the fast rate generalization error, which we
call the $(\eta,c)$-central condition. Under this condition, we give
information-theoretic bounds on the generalization error and excess risk, with
a fast convergence rate for specific learning algorithms such as empirical risk
minimization and its regularized version. Finally, several analytical examples
are given to show the effectiveness of the bounds.
Related papers
- Tighter Information-Theoretic Generalization Bounds from Supersamples [27.14107452619853]
We present a variety of novel information-theoretic generalization bounds for learning algorithms.
The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness.
We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.
arXiv Detail & Related papers (2023-02-05T17:06:27Z) - How Does Pseudo-Labeling Affect the Generalization Error of the
Semi-Supervised Gibbs Algorithm? [73.80001705134147]
We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm.
The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset.
arXiv Detail & Related papers (2022-10-15T04:11:56Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Fast Rate Generalization Error Bounds: Variations on a Theme [5.081241420920605]
We show that the convergence rate of the expected generalization error is in the form of O(sqrtlambda/n)
We identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition.
Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(lambda/n) for specific learning algorithms.
arXiv Detail & Related papers (2022-05-06T10:39:48Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Super fast rates in structured prediction [88.99819200562784]
We show how we can leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value.
We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction.
We then consider kernel ridge regression where we improve known rates in $n-1/4$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem.
arXiv Detail & Related papers (2021-02-01T10:50:04Z) - Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals.
Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance.
We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Generalization Error Bounds via $m$th Central Moments of the Information
Density [14.147617330278662]
We present a general approach to deriving bounds on the generalization error of randomized learning algorithms.
Our approach can be used to obtain bounds on the average generalization error as well as bounds on its tail probabilities.
arXiv Detail & Related papers (2020-04-20T09:23:49Z) - Robust Generalization via $\alpha$-Mutual Information [24.40306100502023]
bounds connecting two probability measures of the same event using R'enyi $alpha$-Divergences and Sibson's $alpha$-Mutual Information.
Results have broad applications, from bounding the generalization error of learning algorithms to the more general framework of adaptive data analysis.
arXiv Detail & Related papers (2020-01-14T11:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.