Gamification of Pure Exploration for Linear Bandits
- URL: http://arxiv.org/abs/2007.00953v1
- Date: Thu, 2 Jul 2020 08:20:35 GMT
- Title: Gamification of Pure Exploration for Linear Bandits
- Authors: R\'emy Degenne, Pierre M\'enard, Xuedong Shang, Michal Valko
- Abstract summary: We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits.
Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive.
We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
- Score: 34.16123941778227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate an active pure-exploration setting, that includes best-arm
identification, in the context of linear stochastic bandits. While
asymptotically optimal algorithms exist for standard multi-arm bandits, the
existence of such algorithms for the best-arm identification in linear bandits
has been elusive despite several attempts to address it. First, we provide a
thorough comparison and new insight over different notions of optimality in the
linear case, including G-optimality, transductive optimality from optimal
experimental design and asymptotic optimality. Second, we design the first
asymptotically optimal algorithm for fixed-confidence pure exploration in
linear bandits. As a consequence, our algorithm naturally bypasses the pitfall
caused by a simple but difficult instance, that most prior algorithms had to be
engineered to deal with explicitly. Finally, we avoid the need to fully solve
an optimal design problem by providing an approach that entails an efficient
implementation.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Dual-Directed Algorithm Design for Efficient Pure Exploration [11.492736493413103]
We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternative options.
We derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples.
Our algorithm is optimal for $epsilon$-best-arm identification and thresholding bandit problems.
arXiv Detail & Related papers (2023-10-30T07:29:17Z) - Information-Directed Selection for Top-Two Algorithms [13.339829037245963]
We consider the best-k-arm identification problem for multi-armed bandits.
The objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort.
We propose information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain.
arXiv Detail & Related papers (2022-05-24T14:07:13Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - An Index-based Deterministic Asymptotically Optimal Algorithm for
Constrained Multi-armed Bandit Problems [0.0]
For the model of constrained multi-armed bandit, we show that there exists an index-based deterministically optimal algorithm.
We provide a finite-time bound to the probability of the optimality given as 1-O(|A|Te-T) where T is the horizon size and A is the set of the arms in the bandit.
arXiv Detail & Related papers (2020-07-29T01:54:22Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Double Explore-then-Commit: Asymptotic Optimality and Beyond [101.77632893858063]
We study the multi-armed bandit problem with subgaussian rewards.
We show that a variant of the explore-then-commit (ETC) algorithm can achieve the optimality for multi-armed bandit problems.
arXiv Detail & Related papers (2020-02-21T08:07:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.