On the tightness of information-theoretic bounds on generalization error
of learning algorithms
- URL: http://arxiv.org/abs/2303.14658v1
- Date: Sun, 26 Mar 2023 08:59:05 GMT
- Title: On the tightness of information-theoretic bounds on generalization error
of learning algorithms
- Authors: Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu
- Abstract summary: We first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound.
We identify the critical conditions needed for the fast rate generalization error, which we call the $(eta,c)$-central condition.
- Score: 5.081241420920605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent line of works, initiated by Russo and Xu, has shown that the
generalization error of a learning algorithm can be upper bounded by
information measures. In most of the relevant works, the convergence rate of
the expected generalization error is in the form of $O(\sqrt{\lambda/n})$ where
$\lambda$ is some information-theoretic quantities such as the mutual
information or conditional mutual information between the data and the learned
hypothesis. However, such a learning rate is typically considered to be
``slow", compared to a ``fast rate" of $O(\lambda/n)$ in many learning
scenarios. In this work, we first show that the square root does not
necessarily imply a slow rate, and a fast rate result can still be obtained
using this bound under appropriate assumptions. Furthermore, we identify the
critical conditions needed for the fast rate generalization error, which we
call the $(\eta,c)$-central condition. Under this condition, we give
information-theoretic bounds on the generalization error and excess risk, with
a fast convergence rate for specific learning algorithms such as empirical risk
minimization and its regularized version. Finally, several analytical examples
are given to show the effectiveness of the bounds.
Related papers
- Towards Understanding the Generalizability of Delayed Stochastic
Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.
Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization.
Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z) - An Information-Theoretic Analysis for Transfer Learning: Error Bounds
and Applications [5.081241420920605]
We give an information-theoretic analysis on the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in the characterizations.
Inspired by the derived bounds, we propose the InfoBoost algorithm in which the importance weights for source and target data are adjusted adaptively.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Fast Rate Generalization Error Bounds: Variations on a Theme [5.081241420920605]
We show that the convergence rate of the expected generalization error is in the form of O(sqrtlambda/n)
We identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition.
Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(lambda/n) for specific learning algorithms.
arXiv Detail & Related papers (2022-05-06T10:39:48Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Super fast rates in structured prediction [88.99819200562784]
We show how we can leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value.
We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction.
We then consider kernel ridge regression where we improve known rates in $n-1/4$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem.
arXiv Detail & Related papers (2021-02-01T10:50:04Z) - Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z) - Generalization Error Bounds via $m$th Central Moments of the Information
Density [14.147617330278662]
We present a general approach to deriving bounds on the generalization error of randomized learning algorithms.
Our approach can be used to obtain bounds on the average generalization error as well as bounds on its tail probabilities.
arXiv Detail & Related papers (2020-04-20T09:23:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.