On the complexity of All $\varepsilon$-Best Arms Identification
- URL: http://arxiv.org/abs/2202.06280v1
- Date: Sun, 13 Feb 2022 10:54:52 GMT
- Title: On the complexity of All $\varepsilon$-Best Arms Identification
- Authors: Aymen Al Marjani, Tom\'a\v{s} Koc\'ak, Aur\'elien Garivier
- Abstract summary: We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards.
We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
- Score: 2.1485350418225244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem introduced by \cite{Mason2020} of identifying all the
$\varepsilon$-optimal arms in a finite stochastic multi-armed bandit with
Gaussian rewards. In the fixed confidence setting, we give a lower bound on the
number of samples required by any algorithm that returns the set of
$\varepsilon$-good arms with a failure probability less than some risk level
$\delta$. This bound writes as $T_{\varepsilon}^*(\mu)\log(1/\delta)$, where
$T_{\varepsilon}^*(\mu)$ is a characteristic time that depends on the vector of
mean rewards $\mu$ and the accuracy parameter $\varepsilon$. We also provide an
efficient numerical method to solve the convex max-min program that defines the
characteristic time. Our method is based on a complete characterization of the
alternative bandit instances that the optimal sampling strategy needs to rule
out, thus making our bound tighter than the one provided by \cite{Mason2020}.
Using this method, we propose a Track-and-Stop algorithm that identifies the
set of $\varepsilon$-good arms w.h.p and enjoys asymptotic optimality (when
$\delta$ goes to zero) in terms of the expected sample complexity. Finally,
using numerical simulations, we demonstrate our algorithm's advantage over
state-of-the-art methods, even for moderate values of the risk parameter.
Related papers
- Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error [25.777423855881878]
We show a novel elimination-based algorithm to show one can obtain an $Oleft(Hepsilonright)$-optimal policy.
We complement our upper bound with an $widetildeOmegaleft(Hepsilonright)$-optimality lower bound, giving a complete picture of this problem.
arXiv Detail & Related papers (2024-07-18T15:58:04Z) - Robust Distribution Learning with Local and Global Adversarial Corruptions [17.22168727622332]
We develop an efficient finite-sample algorithm with error bounded by $sqrtvarepsilon k + rho + tildeO(dsqrtkn-1/(k lor 2))$ when $P$ has bounded covariance.
Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator.
arXiv Detail & Related papers (2024-06-10T17:48:36Z) - Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution.
Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Explicit Best Arm Identification in Linear Bandits Using No-Regret
Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits.
We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Optimal $\delta$-Correct Best-Arm Selection for Heavy-Tailed
Distributions [2.2940141855172036]
We consider the problem of identifying the one with the maximum mean using a $delta$-correct algorithm.
Lower bounds for $delta$-correct algorithms are well known.
We propose a $delta$-correct algorithm that matches the lower bound as $delta$ reduces to zero.
arXiv Detail & Related papers (2019-08-24T05:31:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.