Non-Asymptotic Analysis of a UCB-based Top Two Algorithm
- URL: http://arxiv.org/abs/2210.05431v3
- Date: Mon, 6 Nov 2023 22:31:46 GMT
- Title: Non-Asymptotic Analysis of a UCB-based Top Two Algorithm
- Authors: Marc Jourdan, R\'emy Degenne
- Abstract summary: A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms.
Our proposed UCB-based Top Two simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.
- Score: 4.18804572788063
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A Top Two sampling rule for bandit identification is a method which selects
the next arm to sample from among two candidate arms, a leader and a
challenger. Due to their simplicity and good empirical performance, they have
received increased attention in recent years. However, for fixed-confidence
best arm identification, theoretical guarantees for Top Two methods have only
been obtained in the asymptotic regime, when the error level vanishes. In this
paper, we derive the first non-asymptotic upper bound on the expected sample
complexity of a Top Two algorithm, which holds for any error level. Our
analysis highlights sufficient properties for a regret minimization algorithm
to be used as leader. These properties are satisfied by the UCB algorithm, and
our proposed UCB-based Top Two algorithm simultaneously enjoys non-asymptotic
guarantees and competitive empirical performance.
Related papers
- Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - On Universally Optimal Algorithms for A/B Testing [49.429419538826444]
We study the problem of best-arm identification with fixed budget in multi-armed bandits with Bernoulli rewards.
For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that performs as well as the algorithm sampling each arm equally.
arXiv Detail & Related papers (2023-08-23T08:38:53Z) - SPRT-based Efficient Best Arm Identification in Stochastic Bandits [31.359578768463752]
This paper investigates the best arm identification problem in multi-armed bandits in the fixed confidence setting.
Existing algorithms for the exponential family of bandits face computational challenges.
A framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing.
arXiv Detail & Related papers (2022-07-22T15:54:53Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Double Explore-then-Commit: Asymptotic Optimality and Beyond [101.77632893858063]
We study the multi-armed bandit problem with subgaussian rewards.
We show that a variant of the explore-then-commit (ETC) algorithm can achieve the optimality for multi-armed bandit problems.
arXiv Detail & Related papers (2020-02-21T08:07:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.