Related papers: lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap

lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap

URL: http://arxiv.org/abs/2401.15879v3
Date: Tue, 12 Mar 2024 07:53:36 GMT
Title: lil'HDoC: An Algorithm for Good Arm Identification under Small Threshold Gap
Authors: Tzu-Hsien Tsai, Yun-Da Tsai, Shou-De Lin
Abstract summary: Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold. We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm.
Score: 4.666048091337632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Good arm identification (GAI) is a pure-exploration bandit problem in which a single learner outputs an arm as soon as it is identified as a good arm. A good arm is defined as an arm with an expected reward greater than or equal to a given threshold. This paper focuses on the GAI problem under a small threshold gap, which refers to the distance between the expected rewards of arms and the given threshold. We propose a new algorithm called lil'HDoC to significantly improve the total sample complexity of the HDoC algorithm. We demonstrate that the sample complexity of the first $\lambda$ output arm in lil'HDoC is bounded by the original HDoC algorithm, except for one negligible term, when the distance between the expected reward and threshold is small. Extensive experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both synthetic and real-world datasets.

Related papers

Online Clustering with Bandit Information [5.024813922014978]
We study the problem of online clustering within the multi-armed bandit framework under the fixed confidence setting. We introduce a novel algorithm, Average Tracking Bandit Online Clustering (ATBOC), and prove that it is order optimal. We propose a more efficient algorithm, Lower and Upper Confidence Bound-based Bandit Online Clustering (LUCBBOC), inspired by the LUCB algorithm for best arm identification.
arXiv Detail & Related papers (2025-01-20T11:39:09Z)
Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z)
Optimal Top-Two Method for Best Arm Identification and Fluid Analysis [15.353009236788262]
We propose an optimal top-2 type algorithm for the best arm identification problem. We show that the proposed algorithm is optimal as $delta rightarrow 0$.
arXiv Detail & Related papers (2024-03-14T06:14:07Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z)
Optimal Batched Best Arm Identification [29.635422073014638]
We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. We propose the three-batch best arm identification (Tri-BBAI), which is the first batched algorithm that achieves the optimal sample complexity. We further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first achieves that the near-optimal sample and batch complexity in the non-asymptotic setting.
arXiv Detail & Related papers (2023-10-21T22:55:50Z)
On Universally Optimal Algorithms for A/B Testing [49.429419538826444]
We study the problem of best-arm identification with fixed budget in multi-armed bandits with Bernoulli rewards. For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that performs as well as the algorithm sampling each arm equally.
arXiv Detail & Related papers (2023-08-23T08:38:53Z)
Best arm identification in rare events [0.43012765978447565]
A key application of this framework is in online advertising where click rates of advertisements could be a fraction of a single percent and final conversion to sales, while highly profitable, may again be a small fraction of the click rates. Lately, algorithms for BAI problems have been developed that minimise sample complexity while providing statistical guarantees on the correct arm selection. We exploit the fact that the reward process for each arm is well approximated by a Compound Poisson process to arrive at algorithms that are faster, with a small increase in sample complexity.
arXiv Detail & Related papers (2023-03-14T04:51:24Z)
Differential Good Arm Identification [4.666048091337632]
This paper targets a variant of the multi-armed bandit problem called good arm identification (GAI) GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible. We propose DGAI - a differentiable good arm identification algorithm.
arXiv Detail & Related papers (2023-03-13T14:28:21Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.