Optimal Batched Best Arm Identification
- URL: http://arxiv.org/abs/2310.14129v1
- Date: Sat, 21 Oct 2023 22:55:50 GMT
- Title: Optimal Batched Best Arm Identification
- Authors: Tianyuan Jin, Yu Yang, Jing Tang, Xiaokui Xiao, Pan Xu
- Abstract summary: We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible.
In particular, we aim to find the best arm with probability $1-delta$ for some small constant $delta>0$.
We propose the three-batch best arm identification (Tri-BBAI) and the almost optimal batched best arm identification (Opt-BBAI) algorithm.
- Score: 31.794242669480106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the batched best arm identification (BBAI) problem, where the
learner's goal is to identify the best arm while switching the policy as less
as possible. In particular, we aim to find the best arm with probability
$1-\delta$ for some small constant $\delta>0$ while minimizing both the sample
complexity (total number of arm pulls) and the batch complexity (total number
of batches). We propose the three-batch best arm identification (Tri-BBAI)
algorithm, which is the first batched algorithm that achieves the optimal
sample complexity in the asymptotic setting (i.e., $\delta\rightarrow 0$) and
runs only in at most $3$ batches. Based on Tri-BBAI, we further propose the
almost optimal batched best arm identification (Opt-BBAI) algorithm, which is
the first algorithm that achieves the near-optimal sample and batch complexity
in the non-asymptotic setting (i.e., $\delta>0$ is arbitrarily fixed), while
enjoying the same batch and sample complexity as Tri-BBAI when $\delta$ tends
to zero. Moreover, in the non-asymptotic setting, the complexity of previous
batch algorithms is usually conditioned on the event that the best arm is
returned (with a probability of at least $1-\delta$), which is potentially
unbounded in cases where a sub-optimal arm is returned. In contrast, the
complexity of Opt-BBAI does not rely on such an event. This is achieved through
a novel procedure that we design for checking whether the best arm is
eliminated, which is of independent interest.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Optimal Top-Two Method for Best Arm Identification and Fluid Analysis [15.353009236788262]
We propose an optimal top-2 type algorithm for the best arm identification problem.
We show that the proposed algorithm is optimal as $delta rightarrow 0$.
arXiv Detail & Related papers (2024-03-14T06:14:07Z) - lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold
Gap [4.666048091337632]
Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm.
This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold.
We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm.
arXiv Detail & Related papers (2024-01-29T04:21:47Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - On the Sample Complexity of Representation Learning in Multi-task
Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems.
Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor)
We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - An Optimal Elimination Algorithm for Learning a Best Arm [37.18327505953403]
We consider the classic problem of $(epsilon,delta)$-PAC learning a best arm.
We propose a new approach for $(epsilon,delta)$-PAC learning a best arm.
arXiv Detail & Related papers (2020-06-20T20:21:33Z) - Optimal $\delta$-Correct Best-Arm Selection for Heavy-Tailed
Distributions [2.2940141855172036]
We consider the problem of identifying the one with the maximum mean using a $delta$-correct algorithm.
Lower bounds for $delta$-correct algorithms are well known.
We propose a $delta$-correct algorithm that matches the lower bound as $delta$ reduces to zero.
arXiv Detail & Related papers (2019-08-24T05:31:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.