A Non-asymptotic Approach to Best-Arm Identification for Gaussian
Bandits
- URL: http://arxiv.org/abs/2105.12978v1
- Date: Thu, 27 May 2021 07:42:49 GMT
- Title: A Non-asymptotic Approach to Best-Arm Identification for Gaussian
Bandits
- Authors: Antoine Barrier (UMPA-ENSL, LMO), Aur\'elien Garivier (UMPA-ENSL),
Tom\'a\v{s} Koc\'ak (UMPA-ENSL)
- Abstract summary: We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance.
This strategy called Exploration-Biased Sampling is not onlyally optimal: we also prove non-asymptotic bounds occurring with high probability.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new strategy for best-arm identification with fixed confidence
of Gaussian variables with bounded means and unit variance. This strategy
called Exploration-Biased Sampling is not only asymptotically optimal: we also
prove non-asymptotic bounds occurring with high probability. To the best of our
knowledge, this is the first strategy with such guarantees. But the main
advantage over other algorithms like Track-and-Stop is an improved behavior
regarding exploration: Exploration-Biased Sampling is slightly biased in favor
of exploration in a subtle but natural way that makes it more stable and
interpretable. These improvements are allowed by a new analysis of the sample
complexity optimization problem, which yields a faster numerical resolution
scheme and several quantitative regularity results that we believe of high
independent interest.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.