Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear
Bandits
- URL: http://arxiv.org/abs/2206.04456v1
- Date: Thu, 9 Jun 2022 12:27:51 GMT
- Title: Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear
Bandits
- Authors: Marc Jourdan and R\'emy Degenne
- Abstract summary: In a pure-exploration problem, information is gathered sequentially to answer a question on the environment.
We show that picking the answer with highest mean does not allow an algorithm to reach optimality in terms of expected sample complexity.
We develop a simple procedure to adapt best-arm identification algorithms to tackle $varepsilon$-best-answer identification in transductive linear bandits.
- Score: 0.8122270502556374
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In pure-exploration problems, information is gathered sequentially to answer
a question on the stochastic environment. While best-arm identification for
linear bandits has been extensively studied in recent years, few works have
been dedicated to identifying one arm that is $\varepsilon$-close to the best
one (and not exactly the best one). In this problem with several correct
answers, an identification algorithm should focus on one candidate among those
answers and verify that it is correct. We demonstrate that picking the answer
with highest mean does not allow an algorithm to reach asymptotic optimality in
terms of expected sample complexity. Instead, a \textit{furthest answer} should
be identified. Using that insight to choose the candidate answer carefully, we
develop a simple procedure to adapt best-arm identification algorithms to
tackle $\varepsilon$-best-answer identification in transductive linear
stochastic bandits. Finally, we propose an asymptotically optimal algorithm for
this setting, which is shown to achieve competitive empirical performance
against existing modified best-arm identification algorithms.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Dual-Directed Algorithm Design for Efficient Pure Exploration [11.492736493413103]
We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternative options.
We derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples.
Our algorithm is optimal for $epsilon$-best-arm identification and thresholding bandit problems.
arXiv Detail & Related papers (2023-10-30T07:29:17Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits.
Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive.
We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.