Integral Probability Metrics PAC-Bayes Bounds
- URL: http://arxiv.org/abs/2207.00614v1
- Date: Fri, 1 Jul 2022 18:22:44 GMT
- Title: Integral Probability Metrics PAC-Bayes Bounds
- Authors: Ron Amit, Baruch Epstein, Shay Moran, Ron Meir
- Abstract summary: We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM)
A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case.
- Score: 22.534592918803813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a PAC-Bayes-style generalization bound which enables the
replacement of the KL-divergence with a variety of Integral Probability Metrics
(IPM). We provide instances of this bound with the IPM being the total
variation metric and the Wasserstein distance. A notable feature of the
obtained bounds is that they naturally interpolate between classical uniform
convergence bounds in the worst case (when the prior and posterior are far away
from each other), and preferable bounds in better cases (when the posterior and
prior are close). This illustrates the possibility of reinforcing classical
generalization bounds with algorithm- and data-dependent components, thus
making them more suitable to analyze algorithms that use a large hypothesis
space.
Related papers
- Tighter Generalisation Bounds via Interpolation [16.74864438507713]
We present a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, Gamma)$-divergence.
We also present PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences.
arXiv Detail & Related papers (2024-02-07T18:55:22Z) - PAC-Bayes-Chernoff bounds for unbounded losses [9.987130158432755]
We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram'er-Chernoff bounds to the PAC-Bayesian setting.
Our approach naturally leverages properties of Cram'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds.
arXiv Detail & Related papers (2024-01-02T10:58:54Z) - A unified framework for information-theoretic generalization bounds [8.04975023021212]
This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms.
The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_psi_p$ Orlicz spaces.
arXiv Detail & Related papers (2023-05-18T15:36:20Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - High Probability Convergence for Accelerated Stochastic Mirror Descent [29.189409618561964]
We show high probability convergence with bounds depending on the initial distance to the optimal solution as opposed to the domain diameter.
The algorithms use step sizes analogous to the standard settings and are universal to Lipschitz functions, smooth functions, and their linear combinations.
arXiv Detail & Related papers (2022-10-03T01:50:53Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds
and Comparison with KL-divergence based Optimal Posteriors and
Cross-Validation Procedure [0.0]
We investigate optimal posteriors for chi-squared divergence based PACBayesian bounds in terms of their distribution, scalability of computations, and test set performance.
Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors.
arXiv Detail & Related papers (2020-08-14T03:15:23Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z) - Generalized Sliced Distances for Probability Distributions [47.543990188697734]
We introduce a broad family of probability metrics, coined as Generalized Sliced Probability Metrics (GSPMs)
GSPMs are rooted in the generalized Radon transform and come with a unique geometric interpretation.
We consider GSPM-based gradient flows for generative modeling applications and show that under mild assumptions, the gradient flow converges to the global optimum.
arXiv Detail & Related papers (2020-02-28T04:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.