An Optimal Elimination Algorithm for Learning a Best Arm
- URL: http://arxiv.org/abs/2006.11647v1
- Date: Sat, 20 Jun 2020 20:21:33 GMT
- Title: An Optimal Elimination Algorithm for Learning a Best Arm
- Authors: Avinatan Hassidim, Ron Kupfer, Yaron Singer
- Abstract summary: We consider the classic problem of $(epsilon,delta)$-PAC learning a best arm.
We propose a new approach for $(epsilon,delta)$-PAC learning a best arm.
- Score: 37.18327505953403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the classic problem of $(\epsilon,\delta)$-PAC learning a best
arm where the goal is to identify with confidence $1-\delta$ an arm whose mean
is an $\epsilon$-approximation to that of the highest mean arm in a multi-armed
bandit setting. This problem is one of the most fundamental problems in
statistics and learning theory, yet somewhat surprisingly its worst-case sample
complexity is not well understood. In this paper, we propose a new approach for
$(\epsilon,\delta)$-PAC learning a best arm. This approach leads to an
algorithm whose sample complexity converges to \emph{exactly} the optimal
sample complexity of $(\epsilon,\delta)$-learning the mean of $n$ arms
separately and we complement this result with a conditional matching lower
bound. More specifically:
Related papers
- Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Optimal Top-Two Method for Best Arm Identification and Fluid Analysis [15.353009236788262]
We propose an optimal top-2 type algorithm for the best arm identification problem.
We show that the proposed algorithm is optimal as $delta rightarrow 0$.
arXiv Detail & Related papers (2024-03-14T06:14:07Z) - Optimal Batched Best Arm Identification [31.794242669480106]
We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible.
In particular, we aim to find the best arm with probability $1-delta$ for some small constant $delta>0$.
We propose the three-batch best arm identification (Tri-BBAI) and the almost optimal batched best arm identification (Opt-BBAI) algorithm.
arXiv Detail & Related papers (2023-10-21T22:55:50Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Revisiting Simple Regret Minimization in Multi-Armed Bandits [33.88679679593314]
Simple regret is less popular than the probability of missing the best arm or an $epsilon$-good arm.
In this paper, we achieve improved simple regret upper bounds for both data-rich ($Tge n$) and data-poor regime ($T le n$)
For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once.
arXiv Detail & Related papers (2022-10-30T18:31:03Z) - Best-Arm Identification in Correlated Multi-Armed Bandits [9.180690233707645]
We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms.
We show that the total samples obtained by C-LUCB are of the form $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ as opposed to the typical $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ samples required in the independent reward setting
arXiv Detail & Related papers (2021-09-10T15:41:07Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Explicit Best Arm Identification in Linear Bandits Using No-Regret
Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits.
We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.