Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination
- URL: http://arxiv.org/abs/2111.07458v1
- Date: Sun, 14 Nov 2021 21:49:58 GMT
- Title: Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination
- Authors: Arpan Mukherjee, Ali Tajer, Pin-Yu Chen and Payel Das
- Abstract summary: This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
- Score: 80.53485617514707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the problem of best arm identification in
$\textit{contaminated}$ stochastic multi-arm bandits. In this setting, the
rewards obtained from any arm are replaced by samples from an adversarial model
with probability $\varepsilon$. A fixed confidence (infinite-horizon) setting
is considered, where the goal of the learner is to identify the arm with the
largest mean. Owing to the adversarial contamination of the rewards, each arm's
mean is only partially identifiable. This paper proposes two algorithms, a
gap-based algorithm and one based on the successive elimination, for best arm
identification in sub-Gaussian bandits. These algorithms involve mean estimates
that achieve the optimal error guarantee on the deviation of the true mean from
the estimate asymptotically. Furthermore, these algorithms asymptotically
achieve the optimal sample complexity. Specifically, for the gap-based
algorithm, the sample complexity is asymptotically optimal up to constant
factors, while for the successive elimination-based algorithm, it is optimal up
to logarithmic factors. Finally, numerical experiments are provided to
illustrate the gains of the algorithms compared to the existing baselines.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - On Universally Optimal Algorithms for A/B Testing [49.429419538826444]
We study the problem of best-arm identification with fixed budget in multi-armed bandits with Bernoulli rewards.
For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that performs as well as the algorithm sampling each arm equally.
arXiv Detail & Related papers (2023-08-23T08:38:53Z) - SPRT-based Efficient Best Arm Identification in Stochastic Bandits [31.359578768463752]
This paper investigates the best arm identification problem in multi-armed bandits in the fixed confidence setting.
Existing algorithms for the exponential family of bandits face computational challenges.
A framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing.
arXiv Detail & Related papers (2022-07-22T15:54:53Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.