Related papers: Double Explore-then-Commit: Asymptotic Optimality and Beyond

Double Explore-then-Commit: Asymptotic Optimality and Beyond

URL: http://arxiv.org/abs/2002.09174v2
Date: Thu, 19 Nov 2020 07:40:47 GMT
Title: Double Explore-then-Commit: Asymptotic Optimality and Beyond
Authors: Tianyuan Jin and Pan Xu and Xiaokui Xiao and Quanquan Gu
Abstract summary: We study the multi-armed bandit problem with subgaussian rewards. We show that a variant of the explore-then-commit (ETC) algorithm can achieve the optimality for multi-armed bandit problems.
Score: 101.77632893858063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the multi-armed bandit problem with subgaussian rewards. The explore-then-commit (ETC) strategy, which consists of an exploration phase followed by an exploitation phase, is one of the most widely used algorithms in a variety of online decision applications. Nevertheless, it has been shown in Garivier et al. (2016) that ETC is suboptimal in the asymptotic sense as the horizon grows, and thus, is worse than fully sequential strategies such as Upper Confidence Bound (UCB). In this paper, we show that a variant of ETC algorithm can actually achieve the asymptotic optimality for multi-armed bandit problems as UCB-type algorithms do and extend it to the batched bandit setting. Specifically, we propose a double explore-then-commit (DETC) algorithm that has two exploration and exploitation phases and prove that DETC achieves the asymptotically optimal regret bound. To our knowledge, DETC is the first non-fully-sequential algorithm that achieves such asymptotic optimality. In addition, we extend DETC to batched bandit problems, where (i) the exploration process is split into a small number of batches and (ii) the round complexity is of central interest. We prove that a batched version of DETC can achieve the asymptotic optimality with only a constant round complexity. This is the first batched bandit algorithm that can attain the optimal asymptotic regret bound and optimal round complexity simultaneously.

Related papers

Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z)
Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms [64.10576998630981]
We show the first tight characterization of the optimal Hessian-dependent sample complexity. A Hessian-independent algorithm universally achieves the optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions.
arXiv Detail & Related papers (2023-06-21T17:03:22Z)
Non-Asymptotic Analysis of a UCB-based Top Two Algorithm [4.18804572788063]
A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms. Our proposed UCB-based Top Two simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.
arXiv Detail & Related papers (2022-10-11T13:21:59Z)
Finite-Time Analysis of Fully Decentralized Single-Timescale Actor-Critic [4.94128206910124]
We introduce a fully decentralized Actor-Critic (AC) algorithm, where actor, critic, and global reward estimator are updated in an alternating manner. We show that our algorithm has sample complexity of $tildemathcalO(epsilon-2)$ under Markovian sampling. We also provide a local action privacy-preserving version of our algorithm and its analysis.
arXiv Detail & Related papers (2022-06-12T13:14:14Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits. Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive. We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z)
Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization [21.555331273873175]
We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes of a network. We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.
arXiv Detail & Related papers (2020-06-21T11:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.