Best Arm Identification with Fixed Budget: A Large Deviation Perspective
- URL: http://arxiv.org/abs/2312.12137v2
- Date: Mon, 19 Feb 2024 20:17:22 GMT
- Title: Best Arm Identification with Fixed Budget: A Large Deviation Perspective
- Authors: Po-An Wang, Ruo-Chun Tzeng and Alexandre Proutiere
- Abstract summary: We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
- Score: 54.305323903582845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of identifying the best arm in stochastic Multi-Armed
Bandits (MABs) using a fixed sampling budget. Characterizing the minimal
instance-specific error probability for this problem constitutes one of the
important remaining open problems in MABs. When arms are selected using a
static sampling strategy, the error probability decays exponentially with the
number of samples at a rate that can be explicitly derived via Large Deviation
techniques. Analyzing the performance of algorithms with adaptive sampling
strategies is however much more challenging. In this paper, we establish a
connection between the Large Deviation Principle (LDP) satisfied by the
empirical proportions of arm draws and that satisfied by the empirical arm
rewards. This connection holds for any adaptive algorithm, and is leveraged (i)
to improve error probability upper bounds of some existing algorithms, such as
the celebrated \sr (Successive Rejects) algorithm \citep{audibert2010best}, and
(ii) to devise and analyze new algorithms. In particular, we present \sred
(Continuous Rejects), a truly adaptive algorithm that can reject arms in {\it
any} round based on the observed empirical gaps between the rewards of various
arms. Applying our Large Deviation results, we prove that \sred enjoys better
performance guarantees than existing algorithms, including \sr. Extensive
numerical experiments confirm this observation.
Related papers
- Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget.
The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$.
We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z) - Best Arm Identification in Spectral Bandits [0.0]
Best Arm Identification (BAI) is an important challenge in many applications ranging from parameter tuning to clinical trials.
We study best-arm identification with fixed confidence in bandit models with graph smoothness constraint.
arXiv Detail & Related papers (2020-05-20T04:12:04Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.