Double Explore-then-Commit: Asymptotic Optimality and Beyond
- URL: http://arxiv.org/abs/2002.09174v2
- Date: Thu, 19 Nov 2020 07:40:47 GMT
- Title: Double Explore-then-Commit: Asymptotic Optimality and Beyond
- Authors: Tianyuan Jin and Pan Xu and Xiaokui Xiao and Quanquan Gu
- Abstract summary: We study the multi-armed bandit problem with subgaussian rewards.
We show that a variant of the explore-then-commit (ETC) algorithm can achieve the optimality for multi-armed bandit problems.
- Score: 101.77632893858063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the multi-armed bandit problem with subgaussian rewards. The
explore-then-commit (ETC) strategy, which consists of an exploration phase
followed by an exploitation phase, is one of the most widely used algorithms in
a variety of online decision applications. Nevertheless, it has been shown in
Garivier et al. (2016) that ETC is suboptimal in the asymptotic sense as the
horizon grows, and thus, is worse than fully sequential strategies such as
Upper Confidence Bound (UCB). In this paper, we show that a variant of ETC
algorithm can actually achieve the asymptotic optimality for multi-armed bandit
problems as UCB-type algorithms do and extend it to the batched bandit setting.
Specifically, we propose a double explore-then-commit (DETC) algorithm that has
two exploration and exploitation phases and prove that DETC achieves the
asymptotically optimal regret bound. To our knowledge, DETC is the first
non-fully-sequential algorithm that achieves such asymptotic optimality. In
addition, we extend DETC to batched bandit problems, where (i) the exploration
process is split into a small number of batches and (ii) the round complexity
is of central interest. We prove that a batched version of DETC can achieve the
asymptotic optimality with only a constant round complexity. This is the first
batched bandit algorithm that can attain the optimal asymptotic regret bound
and optimal round complexity simultaneously.
Related papers
- Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and
Optimal Algorithms [64.10576998630981]
We show the first tight characterization of the optimal Hessian-dependent sample complexity.
A Hessian-independent algorithm universally achieves the optimal sample complexities for all Hessian instances.
The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions.
arXiv Detail & Related papers (2023-06-21T17:03:22Z) - Non-Asymptotic Analysis of a UCB-based Top Two Algorithm [4.18804572788063]
A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms.
Our proposed UCB-based Top Two simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.
arXiv Detail & Related papers (2022-10-11T13:21:59Z) - Finite-Time Analysis of Fully Decentralized Single-Timescale
Actor-Critic [4.94128206910124]
We introduce a fully decentralized Actor-Critic (AC) algorithm, where actor, critic, and global reward estimator are updated in an alternating manner.
We show that our algorithm has sample complexity of $tildemathcalO(epsilon-2)$ under Markovian sampling.
We also provide a local action privacy-preserving version of our algorithm and its analysis.
arXiv Detail & Related papers (2022-06-12T13:14:14Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits.
Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive.
We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z) - Optimal and Practical Algorithms for Smooth and Strongly Convex
Decentralized Optimization [21.555331273873175]
We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes of a network.
We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.
arXiv Detail & Related papers (2020-06-21T11:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.