Tighter Expected Generalization Error Bounds via Convexity of
Information Measures
- URL: http://arxiv.org/abs/2202.12150v1
- Date: Thu, 24 Feb 2022 15:36:09 GMT
- Title: Tighter Expected Generalization Error Bounds via Convexity of
Information Measures
- Authors: Gholamali Aminian, Yuheng Bu, Gregory Wornell, Miguel Rodrigues
- Abstract summary: Generalization error bounds are essential to understanding machine learning algorithms.
This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample.
- Score: 8.359770027722275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization error bounds are essential to understanding machine learning
algorithms. This paper presents novel expected generalization error upper
bounds based on the average joint distribution between the output hypothesis
and each input training sample. Multiple generalization error upper bounds
based on different information measures are provided, including Wasserstein
distance, total variation distance, KL divergence, and Jensen-Shannon
divergence. Due to the convexity of the information measures, the proposed
bounds in terms of Wasserstein distance and total variation distance are shown
to be tighter than their counterparts based on individual samples in the
literature. An example is provided to demonstrate the tightness of the proposed
generalization error bounds.
Related papers
- Data-dependent Generalization Bounds via Variable-Size Compressibility [16.2444595840653]
We establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework.
In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data.
Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds.
arXiv Detail & Related papers (2023-03-09T16:17:45Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - Chained Generalisation Bounds [26.043342234937747]
We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.
We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
arXiv Detail & Related papers (2022-03-02T09:34:36Z) - Generalization Bounds via Convex Analysis [12.411844611718958]
We show that it is possible to replace the mutual information by any strongly convex function of the joint input-output distribution.
Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance.
arXiv Detail & Related papers (2022-02-10T12:30:45Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Generalization Bounds via Information Density and Conditional
Information Density [14.147617330278662]
We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms.
We provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios.
arXiv Detail & Related papers (2020-05-16T17:04:24Z) - Robust Generalization via $\alpha$-Mutual Information [24.40306100502023]
bounds connecting two probability measures of the same event using R'enyi $alpha$-Divergences and Sibson's $alpha$-Mutual Information.
Results have broad applications, from bounding the generalization error of learning algorithms to the more general framework of adaptive data analysis.
arXiv Detail & Related papers (2020-01-14T11:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.