Related papers: Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

URL: http://arxiv.org/abs/2210.05431v3
Date: Mon, 6 Nov 2023 22:31:46 GMT
Title: Non-Asymptotic Analysis of a UCB-based Top Two Algorithm
Authors: Marc Jourdan, R\'emy Degenne
Abstract summary: A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms. Our proposed UCB-based Top Two simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.
Score: 4.18804572788063
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. However, for fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the asymptotic regime, when the error level vanishes. In this paper, we derive the first non-asymptotic upper bound on the expected sample complexity of a Top Two algorithm, which holds for any error level. Our analysis highlights sufficient properties for a regret minimization algorithm to be used as leader. These properties are satisfied by the UCB algorithm, and our proposed UCB-based Top Two algorithm simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.

Related papers

Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [56.805574957824135]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z)
UCB algorithms for multi-armed bandits: Precise regret and adaptive inference [6.349503549199403]
Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. We show that the Lai-Robbins regret formula is exact if and only if the sub-optimality gaps exceed the order $sigmasqrtKlog T/T$. We also show that its maximal regret deviates from the minimax regret by a logarithmic factor, and therefore settling its strict minimax optimality in the negative.
arXiv Detail & Related papers (2024-12-09T01:14:02Z)
Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z)
On Universally Optimal Algorithms for A/B Testing [49.429419538826444]
We study the problem of best-arm identification with fixed budget in multi-armed bandits with Bernoulli rewards. For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that performs as well as the algorithm sampling each arm equally.
arXiv Detail & Related papers (2023-08-23T08:38:53Z)
SPRT-based Efficient Best Arm Identification in Stochastic Bandits [31.359578768463752]
This paper investigates the best arm identification problem in multi-armed bandits in the fixed confidence setting. Existing algorithms for the exponential family of bandits face computational challenges. A framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing.
arXiv Detail & Related papers (2022-07-22T15:54:53Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Double Explore-then-Commit: Asymptotic Optimality and Beyond [101.77632893858063]
We study the multi-armed bandit problem with subgaussian rewards. We show that a variant of the explore-then-commit (ETC) algorithm can achieve the optimality for multi-armed bandit problems.
arXiv Detail & Related papers (2020-02-21T08:07:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.