Related papers: Tighter PAC-Bayes Bounds Through Coin-Betting

Tighter PAC-Bayes Bounds Through Coin-Betting

URL: http://arxiv.org/abs/2302.05829v1
Date: Sun, 12 Feb 2023 01:16:27 GMT
Title: Tighter PAC-Bayes Bounds Through Coin-Betting
Authors: Kyoungseok Jang, Kwang-Sung Jun, Ilja Kuzborskij, Francesco Orabona
Abstract summary: We show how to refine the proof strategy of the PAC-Bayes bounds and achieve empheven tighter guarantees. We derive the first PAC-Bayes concentration inequality based on the coin-betting approach that holds simultaneously for all sample sizes.
Score: 31.148069991567215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of estimating the mean of a sequence of random elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and $\theta$ is a possibly $S$-dependent parameter. An example of such a problem would be to estimate the generalization error of a neural network trained on $n$ examples where $f$ is a loss function. Classically, this problem is approached through concentration inequalities holding uniformly over compact parameter sets of functions $f$, for example as in Rademacher or VC type analysis. However, in many problems, such inequalities often yield numerically vacuous estimates. Recently, the \emph{PAC-Bayes} framework has been proposed as a better alternative for this class of problems for its ability to often give numerically non-vacuous bounds. In this paper, we show that we can do even better: we show how to refine the proof strategy of the PAC-Bayes bounds and achieve \emph{even tighter} guarantees. Our approach is based on the \emph{coin-betting} framework that derives the numerically tightest known time-uniform concentration inequalities from the regret guarantees of online gambling algorithms. In particular, we derive the first PAC-Bayes concentration inequality based on the coin-betting approach that holds simultaneously for all sample sizes. We demonstrate its tightness showing that by \emph{relaxing} it we obtain a number of previous results in a closed form including Bernoulli-KL and empirical Bernstein inequalities. Finally, we propose an efficient algorithm to numerically calculate confidence sequences from our bound, which often generates nonvacuous confidence bounds even with one sample, unlike the state-of-the-art PAC-Bayes bounds.

Related papers

Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness [13.802167452101909]
Prior work established that the optimal error under such instance-targeted poisoning attacks scales as $Theta(deta)$.<n>We show that the optimal excess error is $tildeTheta(sqrtdeta)$, answering one of the main open problems left by Hanneke et al.
arXiv Detail & Related papers (2025-06-03T16:53:20Z)
How to Shrink Confidence Sets for Many Equivalent Discrete Distributions? [17.52590726972374]
We exploit permutation-equivalence in machine learning problems. We show that the size of confidence sets shrinks at rates of $O/sqrtn_k)$ and $O/max_kin K n_k)$.
arXiv Detail & Related papers (2024-07-22T14:19:19Z)
Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z)
Restless Linear Bandits [5.00389879175348]
It is assumed that there exists an unknown $mathbbRd$-valued stationary $varphi$-mixing sequence of parameters $(theta_t,t in mathbbN)$ which gives rise to pay-offs. An optimistic algorithm, called LinMix-UCB, is proposed for the case where $theta_t$ has an exponential mixing rate.
arXiv Detail & Related papers (2024-05-17T14:37:39Z)
Better-than-KL PAC-Bayes Bounds [23.87003743389573]
We show that it is possible to achieve a strictly tighter bound with a novel and better-than-KL divergence. Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL.
arXiv Detail & Related papers (2024-02-14T14:33:39Z)
Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization [51.533164528799084]
We show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of $lambda$. We find that this pattern holds true even when the function evaluations are noisy.
arXiv Detail & Related papers (2024-01-11T07:45:09Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z)
Mind the gap: Achieving a super-Grover quantum speedup by jumping to the end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems. We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$. We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z)
Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise. We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting. We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$. Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution. We show that the constraint of local differential privacy incurs an exponential increase in cost. Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.