Exactly Tight Information-Theoretic Generalization Error Bound for the
Quadratic Gaussian Problem
- URL: http://arxiv.org/abs/2305.00876v2
- Date: Mon, 13 Nov 2023 04:40:26 GMT
- Title: Exactly Tight Information-Theoretic Generalization Error Bound for the
Quadratic Gaussian Problem
- Authors: Ruida Zhou, Chao Tian, Tie Liu
- Abstract summary: We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant)
Most existing bounds are order-wise loose in this setting.
A refined bound is then proposed for decomposable loss functions, leading to a tight bound for the vector setting.
- Score: 16.04977600068383
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We provide a new information-theoretic generalization error bound that is
exactly tight (i.e., matching even the constant) for the canonical quadratic
Gaussian (location) problem. Most existing bounds are order-wise loose in this
setting, which has raised concerns about the fundamental capability of
information-theoretic bounds in reasoning the generalization behavior for
machine learning. The proposed new bound adopts the individual-sample-based
approach proposed by Bu et al., but also has several key new ingredients.
Firstly, instead of applying the change of measure inequality on the loss
function, we apply it to the generalization error function itself; secondly,
the bound is derived in a conditional manner; lastly, a reference distribution
is introduced. The combination of these components produces a
KL-divergence-based generalization error bound. We show that although the
latter two new ingredients can help make the bound exactly tight, removing them
does not significantly degrade the bound, leading to an asymptotically tight
mutual-information-based bound. We further consider the vector Gaussian
setting, where a direct application of the proposed bound again does not lead
to tight bounds except in special cases. A refined bound is then proposed for
decomposable loss functions, leading to a tight bound for the vector setting.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - On Regularization and Inference with Label Constraints [62.60903248392479]
We compare two strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference.
For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints.
For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage.
arXiv Detail & Related papers (2023-07-08T03:39:22Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space.
We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021)
We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Chained Generalisation Bounds [26.043342234937747]
We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.
We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
arXiv Detail & Related papers (2022-03-02T09:34:36Z) - Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and
Benign Overfitting [35.78863301525758]
We prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class.
Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. ( 2020) for minimum-norm interpolators.
We show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.
arXiv Detail & Related papers (2021-06-17T06:58:10Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - Tighter expected generalization error bounds via Wasserstein distance [23.52237892358981]
We introduce several expected generalization error bounds based on the Wasserstein distance.
We present full-dataset, single-letter, and random-subset bounds on both the standard setting and the randomized-subsample setting.
We show how various new bounds based on different information measures can be derived from the presented bounds.
arXiv Detail & Related papers (2021-01-22T20:13:59Z) - Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals.
Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance.
We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.