A New Family of Generalization Bounds Using Samplewise Evaluated CMI
- URL: http://arxiv.org/abs/2210.06422v2
- Date: Mon, 27 Mar 2023 13:53:56 GMT
- Title: A New Family of Generalization Bounds Using Samplewise Evaluated CMI
- Authors: Fredrik Hellstr\"om and Giuseppe Durisi
- Abstract summary: We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function.
We demonstrate the generality of this framework by extending previously known information-theoretic bounds.
In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks than previous bounds.
- Score: 14.147617330278662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new family of information-theoretic generalization bounds, in
which the training loss and the population loss are compared through a jointly
convex function. This function is upper-bounded in terms of the disintegrated,
samplewise, evaluated conditional mutual information (CMI), an information
measure that depends on the losses incurred by the selected hypothesis, rather
than on the hypothesis itself, as is common in probably approximately correct
(PAC)-Bayesian results. We demonstrate the generality of this framework by
recovering and extending previously known information-theoretic bounds.
Furthermore, using the evaluated CMI, we derive a samplewise, average version
of Seeger's PAC-Bayesian bound, where the convex function is the binary KL
divergence. In some scenarios, this novel bound results in a tighter
characterization of the population loss of deep neural networks than previous
bounds. Finally, we derive high-probability versions of some of these average
bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using
them to recover average and high-probability generalization bounds for
multiclass classification with finite Natarajan dimension.
Related papers
- In-Context Parametric Inference: Point or Distribution Estimators? [66.22308335324239]
We show that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
arXiv Detail & Related papers (2025-02-17T10:00:24Z) - Tightening the Evaluation of PAC Bounds Using Formal Verification Results [16.671097256655777]
We take the novel approach of using the formal verification of neural systems to inform the evaluation of PAC bounds.
We show that conditioning existing bounds on verification results leads to a tightening proportional to the underlying probability mass of the verified region.
arXiv Detail & Related papers (2024-07-29T15:53:14Z) - Generalized Laplace Approximation [23.185126261153236]
We introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors.
We propose the generalized Laplace approximation, which involves a simple adjustment to the Hessian matrix of the regularized loss function.
We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets.
arXiv Detail & Related papers (2024-05-22T11:11:42Z) - Intrinsic Bayesian Cramér-Rao Bound with an Application to Covariance Matrix Estimation [49.67011673289242]
This paper presents a new performance bound for estimation problems where the parameter to estimate lies in a smooth manifold.
It induces a geometry for the parameter manifold, as well as an intrinsic notion of the estimation error measure.
arXiv Detail & Related papers (2023-11-08T15:17:13Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - Adversarial Estimators [0.0]
We develop an theory of adversarial estimators (A-estimators')
We present results characterizing the convergence rates of A-estimators under both point-wise and partial identification.
Our theory also yields the normality of general functionals of neural network M-estimators.
arXiv Detail & Related papers (2022-04-22T04:39:44Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z) - Exact Recovery in the General Hypergraph Stochastic Block Model [92.28929858529679]
This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph block model (d-HSBM)
We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below it.
arXiv Detail & Related papers (2021-05-11T03:39:08Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Generalization Bounds via Information Density and Conditional
Information Density [14.147617330278662]
We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms.
We provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios.
arXiv Detail & Related papers (2020-05-16T17:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.