Tighter expected generalization error bounds via Wasserstein distance
- URL: http://arxiv.org/abs/2101.09315v1
- Date: Fri, 22 Jan 2021 20:13:59 GMT
- Title: Tighter expected generalization error bounds via Wasserstein distance
- Authors: Borja Rodr\'iguez-G\'alvez, Germ\'an Bassi, Ragnar Thobaben, and
Mikael Skoglund
- Abstract summary: We introduce several expected generalization error bounds based on the Wasserstein distance.
We present full-dataset, single-letter, and random-subset bounds on both the standard setting and the randomized-subsample setting.
We show how various new bounds based on different information measures can be derived from the presented bounds.
- Score: 23.52237892358981
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we introduce several expected generalization error bounds based
on the Wasserstein distance. More precisely, we present full-dataset,
single-letter, and random-subset bounds on both the standard setting and the
randomized-subsample setting from Steinke and Zakynthinou [2020]. Moreover, we
show that, when the loss function is bounded, these bounds recover from below
(and thus are tighter than) current bounds based on the relative entropy and,
for the standard setting, generate new, non-vacuous bounds also based on the
relative entropy. Then, we show how similar bounds featuring the backward
channel can be derived with the proposed proof techniques. Finally, we show how
various new bounds based on different information measures (e.g., the lautum
information or several $f$-divergences) can be derived from the presented
bounds.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - PAC-Bayes-Chernoff bounds for unbounded losses [9.987130158432755]
We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram'er-Chernoff bounds to the PAC-Bayesian setting.
Our approach naturally leverages properties of Cram'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z) - More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity [27.87324770020133]
We present new high-probability PAC-Bayes bounds for different types of losses.
For losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values.
For losses with more general tail behaviors, we introduce two new parameter-free bounds.
arXiv Detail & Related papers (2023-06-21T12:13:46Z) - Exactly Tight Information-Theoretic Generalization Error Bound for the
Quadratic Gaussian Problem [16.04977600068383]
We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant)
Most existing bounds are order-wise loose in this setting.
A refined bound is then proposed for decomposable loss functions, leading to a tight bound for the vector setting.
arXiv Detail & Related papers (2023-05-01T15:22:58Z) - Tighter Information-Theoretic Generalization Bounds from Supersamples [27.14107452619853]
We present a variety of novel information-theoretic generalization bounds for learning algorithms.
The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness.
We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.
arXiv Detail & Related papers (2023-02-05T17:06:27Z) - Limitations of Information-Theoretic Generalization Bounds for Gradient
Descent Methods in Stochastic Convex Optimization [48.12845778927164]
We consider the prospect of establishing minimax rates for gradient descent in the setting of convex optimization.
We consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm.
Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
arXiv Detail & Related papers (2022-12-27T17:16:48Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Chained Generalisation Bounds [26.043342234937747]
We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique.
We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
arXiv Detail & Related papers (2022-03-02T09:34:36Z) - Tight Exponential Analysis for Smoothing the Max-Relative Entropy and
for Quantum Privacy Amplification [56.61325554836984]
The max-relative entropy together with its smoothed version is a basic tool in quantum information theory.
We derive the exact exponent for the decay of the small modification of the quantum state in smoothing the max-relative entropy based on purified distance.
arXiv Detail & Related papers (2021-11-01T16:35:41Z) - R\'enyi divergence inequalities via interpolation, with applications to
generalised entropic uncertainty relations [91.3755431537592]
We investigate quantum R'enyi entropic quantities, specifically those derived from'sandwiched' divergence.
We present R'enyi mutual information decomposition rules, a new approach to the R'enyi conditional entropy tripartite chain rules and a more general bipartite comparison.
arXiv Detail & Related papers (2021-06-19T04:06:23Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.