Related papers: On the complexity of All $\varepsilon$-Best Arms Identification

On the complexity of All $\varepsilon$-Best Arms Identification

URL: http://arxiv.org/abs/2202.06280v1
Date: Sun, 13 Feb 2022 10:54:52 GMT
Title: On the complexity of All $\varepsilon$-Best Arms Identification
Authors: Aymen Al Marjani, Tom\'a\v{s} Koc\'ak, Aur\'elien Garivier
Abstract summary: We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards. We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
Score: 2.1485350418225244
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem introduced by \cite{Mason2020} of identifying all the $\varepsilon$-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. In the fixed confidence setting, we give a lower bound on the number of samples required by any algorithm that returns the set of $\varepsilon$-good arms with a failure probability less than some risk level $\delta$. This bound writes as $T_{\varepsilon}^*(\mu)\log(1/\delta)$, where $T_{\varepsilon}^*(\mu)$ is a characteristic time that depends on the vector of mean rewards $\mu$ and the accuracy parameter $\varepsilon$. We also provide an efficient numerical method to solve the convex max-min program that defines the characteristic time. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by \cite{Mason2020}. Using this method, we propose a Track-and-Stop algorithm that identifies the set of $\varepsilon$-good arms w.h.p and enjoys asymptotic optimality (when $\delta$ goes to zero) in terms of the expected sample complexity. Finally, using numerical simulations, we demonstrate our algorithm's advantage over state-of-the-art methods, even for moderate values of the risk parameter.

Related papers

Multi-objective Good Arm Identification with Bandit Feedback [0.589889361990138]
We consider a good arm identification problem in a bandit setting with multi-objectives. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $mathcalD_i_t$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.
arXiv Detail & Related papers (2025-03-13T14:04:04Z)
Robust Sparse Regression with Non-Isotropic Designs [4.964650614497048]
We develop a technique to design efficiently computable estimators for sparse linear regression in the simultaneous presence of two adversaries. We provide a novel analysis of weighted penalized Huber loss that is suitable for heavy-tailed designs in the presence of two adversaries.
arXiv Detail & Related papers (2024-10-31T13:51:59Z)
Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits [55.957560311008926]
We propose a piecewise stationary linear bandit (PSLB) model where the quality of an arm is measured by its return averaged over all contexts. PS$varepsilon$BAI$+$ is guaranteed to identify an $varepsilon$-optimal arm with probability $ge 1-delta$ and with a minimal number of samples.
arXiv Detail & Related papers (2024-10-10T06:15:42Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z)
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP) We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
Optimal $\delta$-Correct Best-Arm Selection for Heavy-Tailed Distributions [2.2940141855172036]
We consider the problem of identifying the one with the maximum mean using a $delta$-correct algorithm. Lower bounds for $delta$-correct algorithms are well known. We propose a $delta$-correct algorithm that matches the lower bound as $delta$ reduces to zero.
arXiv Detail & Related papers (2019-08-24T05:31:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.