Tighter PAC-Bayes Bounds Through Coin-Betting
- URL: http://arxiv.org/abs/2302.05829v1
- Date: Sun, 12 Feb 2023 01:16:27 GMT
- Title: Tighter PAC-Bayes Bounds Through Coin-Betting
- Authors: Kyoungseok Jang, Kwang-Sung Jun, Ilja Kuzborskij, Francesco Orabona
- Abstract summary: We show how to refine the proof strategy of the PAC-Bayes bounds and achieve empheven tighter guarantees.
We derive the first PAC-Bayes concentration inequality based on the coin-betting approach that holds simultaneously for all sample sizes.
- Score: 31.148069991567215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of estimating the mean of a sequence of random
elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed
scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and
$\theta$ is a possibly $S$-dependent parameter. An example of such a problem
would be to estimate the generalization error of a neural network trained on
$n$ examples where $f$ is a loss function. Classically, this problem is
approached through concentration inequalities holding uniformly over compact
parameter sets of functions $f$, for example as in Rademacher or VC type
analysis. However, in many problems, such inequalities often yield numerically
vacuous estimates. Recently, the \emph{PAC-Bayes} framework has been proposed
as a better alternative for this class of problems for its ability to often
give numerically non-vacuous bounds. In this paper, we show that we can do even
better: we show how to refine the proof strategy of the PAC-Bayes bounds and
achieve \emph{even tighter} guarantees. Our approach is based on the
\emph{coin-betting} framework that derives the numerically tightest known
time-uniform concentration inequalities from the regret guarantees of online
gambling algorithms. In particular, we derive the first PAC-Bayes concentration
inequality based on the coin-betting approach that holds simultaneously for all
sample sizes. We demonstrate its tightness showing that by \emph{relaxing} it
we obtain a number of previous results in a closed form including Bernoulli-KL
and empirical Bernstein inequalities. Finally, we propose an efficient
algorithm to numerically calculate confidence sequences from our bound, which
often generates nonvacuous confidence bounds even with one sample, unlike the
state-of-the-art PAC-Bayes bounds.
Related papers
- How to Shrink Confidence Sets for Many Equivalent Discrete Distributions? [17.52590726972374]
We exploit permutation-equivalence in machine learning problems.
We show that the size of confidence sets shrinks at rates of $O/sqrtn_k)$ and $O/max_kin K n_k)$.
arXiv Detail & Related papers (2024-07-22T14:19:19Z) - Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Restless Linear Bandits [5.00389879175348]
It is assumed that there exists an unknown $mathbbRd$-valued stationary $varphi$-mixing sequence of parameters $(theta_t,t in mathbbN)$ which gives rise to pay-offs.
An optimistic algorithm, called LinMix-UCB, is proposed for the case where $theta_t$ has an exponential mixing rate.
arXiv Detail & Related papers (2024-05-17T14:37:39Z) - Better-than-KL PAC-Bayes Bounds [23.87003743389573]
We show that it is possible to achieve a strictly tighter bound with a novel and better-than-KL divergence.
Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL.
arXiv Detail & Related papers (2024-02-14T14:33:39Z) - Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature
and Bayesian Optimization [51.533164528799084]
We show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of $lambda$.
We find that this pattern holds true even when the function evaluations are noisy.
arXiv Detail & Related papers (2024-01-11T07:45:09Z) - Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution.
Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z) - Mind the gap: Achieving a super-Grover quantum speedup by jumping to the
end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems.
We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$.
We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.