Related papers: Comparing Comparators in Generalization Bounds

Comparing Comparators in Generalization Bounds

URL: http://arxiv.org/abs/2310.10534v2
Date: Wed, 21 Feb 2024 13:20:36 GMT
Title: Comparing Comparators in Generalization Bounds
Authors: Fredrik Hellstr\"om, Benjamin Guedj
Abstract summary: We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.
Score: 17.808906879967353
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cram\'er function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.

Related papers

Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures [14.408127155170774]
We leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures. Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap.
arXiv Detail & Related papers (2024-02-19T10:15:11Z)
Tighter Generalisation Bounds via Interpolation [16.74864438507713]
We present a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, Gamma)$-divergence. We also present PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences.
arXiv Detail & Related papers (2024-02-07T18:55:22Z)
Exactly Tight Information-Theoretic Generalization Error Bound for the Quadratic Gaussian Problem [16.04977600068383]
We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant) Most existing bounds are order-wise loose in this setting. A refined bound is then proposed for decomposable loss functions, leading to a tight bound for the vector setting.
arXiv Detail & Related papers (2023-05-01T15:22:58Z)
On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities. We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z)
Integral Probability Metrics PAC-Bayes Bounds [22.534592918803813]
We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM) A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case.
arXiv Detail & Related papers (2022-07-01T18:22:44Z)
Chained Generalisation Bounds [26.043342234937747]
We derive upper bounds for the expected generalisation error of supervised learning algorithms by means of the chaining technique. We establish a duality between generalisation bounds based on the regularity of the loss function, and their chained counterparts.
arXiv Detail & Related papers (2022-03-02T09:34:36Z)
Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective [49.17352150219212]
Federated AveragingFedAvg, also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL) We show how to analyze this quantity from the Differential Equation (SDE) perspective.
arXiv Detail & Related papers (2021-11-05T22:16:11Z)
Conformal field theory from lattice fermions [77.34726150561087]
We provide a rigorous lattice approximation of conformal field theories given in terms of lattice fermions in 1+1-dimensions. We show how these results lead to explicit error estimates pertaining to the quantum simulation of conformal field theories.
arXiv Detail & Related papers (2021-07-29T08:54:07Z)
Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity. We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z)
Metrizing Weak Convergence with Maximum Mean Discrepancies [88.54422104669078]
This paper characterizes the maximum mean discrepancies (MMD) that metrize the weak convergence of probability measures for a wide class of kernels. We prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, metrizes the weak convergence of probability measures if and only if k is continuous.
arXiv Detail & Related papers (2020-06-16T15:49:33Z)
The Convergence Indicator: Improved and completely characterized parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge. Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.