Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed
Bandit
- URL: http://arxiv.org/abs/2106.06848v1
- Date: Sat, 12 Jun 2021 20:05:29 GMT
- Title: Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed
Bandit
- Authors: MohammadJavad Azizi, Sheldon M Ross, Zhengyu Zhang
- Abstract summary: We consider the problem of finding, through adaptive sampling, which of n populations (arms) has the largest mean.
Our objective is to determine a rule which identifies the best population with a fixed minimum confidence using as few observations as possible.
- Score: 1.5469452301122177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of finding, through adaptive sampling, which of n
populations (arms) has the largest mean. Our objective is to determine a rule
which identifies the best population with a fixed minimum confidence using as
few observations as possible, i.e. fixed-confidence (FC) best arm
identification (BAI) in multi-armed bandits. We study such problems under the
Bayesian setting with both Bernoulli and Gaussian populations. We propose to
use the classical vector at a time (VT) rule, which samples each alive
population once in each round. We show how VT can be implemented and analyzed
in our Bayesian setting and be improved by early elimination. We also propose
and analyze a variant of the classical play the winner (PW) algorithm.
Numerical results show that these rules compare favorably with state-of-art
algorithms.
Related papers
- Fixed Confidence Best Arm Identification in the Bayesian Setting [6.083234045523298]
We consider the fixed-confidence best arm identification (FC-BAI) problem in the Bayesian setting.
This problem aims to find the arm of the largest mean with a fixed confidence level when the bandit model has been sampled from the known prior.
arXiv Detail & Related papers (2024-02-16T03:36:03Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Thompson Exploration with Best Challenger Rule in Best Arm
Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework.
We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z) - Bayesian Fixed-Budget Best-Arm Identification [24.31655036648236]
Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations.
We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm.
arXiv Detail & Related papers (2022-11-15T23:29:51Z) - Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Robust Outlier Arm Identification [16.21284542559277]
We study the problem of Robust Outlier Arm Identification (ROAI)
The goal is to identify arms whose expected rewards deviate substantially from the majority.
We compute the outlier threshold using the median and median absolute deviation of the expected rewards.
arXiv Detail & Related papers (2020-09-21T16:13:01Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Statistical Efficiency of Thompson Sampling for Combinatorial
Semi-Bandits [56.31950477139053]
We investigate multi-armed bandit with semi-bandit feedback (CMAB)
We analyze variants of the Combinatorial Thompson Sampling policy (CTS)
This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
arXiv Detail & Related papers (2020-06-11T17:12:11Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.