On the complexity of All $\varepsilon$-Best Arms Identification
- URL: http://arxiv.org/abs/2202.06280v1
- Date: Sun, 13 Feb 2022 10:54:52 GMT
- Title: On the complexity of All $\varepsilon$-Best Arms Identification
- Authors: Aymen Al Marjani, Tom\'a\v{s} Koc\'ak, Aur\'elien Garivier
- Abstract summary: We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards.
We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
- Score: 2.1485350418225244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem introduced by \cite{Mason2020} of identifying all the
$\varepsilon$-optimal arms in a finite stochastic multi-armed bandit with
Gaussian rewards. In the fixed confidence setting, we give a lower bound on the
number of samples required by any algorithm that returns the set of
$\varepsilon$-good arms with a failure probability less than some risk level
$\delta$. This bound writes as $T_{\varepsilon}^*(\mu)\log(1/\delta)$, where
$T_{\varepsilon}^*(\mu)$ is a characteristic time that depends on the vector of
mean rewards $\mu$ and the accuracy parameter $\varepsilon$. We also provide an
efficient numerical method to solve the convex max-min program that defines the
characteristic time. Our method is based on a complete characterization of the
alternative bandit instances that the optimal sampling strategy needs to rule
out, thus making our bound tighter than the one provided by \cite{Mason2020}.
Using this method, we propose a Track-and-Stop algorithm that identifies the
set of $\varepsilon$-good arms w.h.p and enjoys asymptotic optimality (when
$\delta$ goes to zero) in terms of the expected sample complexity. Finally,
using numerical simulations, we demonstrate our algorithm's advantage over
state-of-the-art methods, even for moderate values of the risk parameter.
Related papers
- Robust Sparse Regression with Non-Isotropic Designs [4.964650614497048]
We develop a technique to design efficiently computable estimators for sparse linear regression in the simultaneous presence of two adversaries.
We provide a novel analysis of weighted penalized Huber loss that is suitable for heavy-tailed designs in the presence of two adversaries.
arXiv Detail & Related papers (2024-10-31T13:51:59Z) - Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits [55.957560311008926]
We propose a piecewise stationary linear bandit (PSLB) model where the quality of an arm is measured by its return averaged over all contexts.
PS$varepsilon$BAI$+$ is guaranteed to identify an $varepsilon$-optimal arm with probability $ge 1-delta$ and with a minimal number of samples.
arXiv Detail & Related papers (2024-10-10T06:15:42Z) - Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution.
Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Explicit Best Arm Identification in Linear Bandits Using No-Regret
Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits.
We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Optimal $\delta$-Correct Best-Arm Selection for Heavy-Tailed
Distributions [2.2940141855172036]
We consider the problem of identifying the one with the maximum mean using a $delta$-correct algorithm.
Lower bounds for $delta$-correct algorithms are well known.
We propose a $delta$-correct algorithm that matches the lower bound as $delta$ reduces to zero.
arXiv Detail & Related papers (2019-08-24T05:31:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.