SPRT-based Efficient Best Arm Identification in Stochastic Bandits
- URL: http://arxiv.org/abs/2207.11158v3
- Date: Fri, 23 Jun 2023 00:49:46 GMT
- Title: SPRT-based Efficient Best Arm Identification in Stochastic Bandits
- Authors: Arpan Mukherjee and Ali Tajer
- Abstract summary: This paper investigates the best arm identification problem in multi-armed bandits in the fixed confidence setting.
Existing algorithms for the exponential family of bandits face computational challenges.
A framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing.
- Score: 31.359578768463752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the best arm identification (BAI) problem in
stochastic multi-armed bandits in the fixed confidence setting. The general
class of the exponential family of bandits is considered. The existing
algorithms for the exponential family of bandits face computational challenges.
To mitigate these challenges, the BAI problem is viewed and analyzed as a
sequential composite hypothesis testing task, and a framework is proposed that
adopts the likelihood ratio-based tests known to be effective for sequential
testing. Based on this test statistic, a BAI algorithm is designed that
leverages the canonical sequential probability ratio tests for arm selection
and is amenable to tractable analysis for the exponential family of bandits.
This algorithm has two key features: (1) its sample complexity is
asymptotically optimal, and (2) it is guaranteed to be $\delta-$PAC. Existing
efficient approaches focus on the Gaussian setting and require Thompson
sampling for the arm deemed the best and the challenger arm. Additionally, this
paper analytically quantifies the computational expense of identifying the
challenger in an existing approach. Finally, numerical experiments are provided
to support the analysis.
Related papers
- Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget.
The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$.
We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Thompson Exploration with Best Challenger Rule in Best Arm
Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework.
We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z) - Best Arm Identification in Stochastic Bandits: Beyond $\beta-$optimality [31.359578768463752]
This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in multi-armed bandits in the fixed-confidence setting.
Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality.
This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules.
arXiv Detail & Related papers (2023-01-10T05:02:49Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.