Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits
- URL: http://arxiv.org/abs/2306.01995v1
- Date: Sat, 3 Jun 2023 04:00:47 GMT
- Title: Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits
- Authors: Xiao-Yue Gong, Mark Sellke
- Abstract summary: We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution.
Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
- Score: 4.811176167998627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study pure exploration with infinitely many bandit arms generated i.i.d.
from an unknown distribution. Our goal is to efficiently select a single high
quality arm whose average reward is, with probability $1-\delta$, within
$\varepsilon$ of being among the top $\eta$-fraction of arms; this is a natural
adaptation of the classical PAC guarantee for infinite action sets. We consider
both the fixed confidence and fixed budget settings, aiming respectively for
minimal expected and fixed sample complexity.
For fixed confidence, we give an algorithm with expected sample complexity
$O\left(\frac{\log (1/\eta)\log (1/\delta)}{\eta\varepsilon^2}\right)$. This is
optimal except for the $\log (1/\eta)$ factor, and the $\delta$-dependence
closes a quadratic gap in the literature. For fixed budget, we show the
asymptotically optimal sample complexity as $\delta\to 0$ is
$c^{-1}\log(1/\delta)\big(\log\log(1/\delta)\big)^2$ to leading order.
Equivalently, the optimal failure probability given exactly $N$ samples decays
as $\exp\big(-cN/\log^2 N\big)$, up to a factor $1\pm o_N(1)$ inside the
exponent. The constant $c$ depends explicitly on the problem parameters
(including the unknown arm distribution) through a certain Fisher information
distance. Even the strictly super-linear dependence on $\log(1/\delta)$ was not
known and resolves a question of Grossman and Moshkovitz (FOCS 2016, SIAM
Journal on Computing 2020).
Related papers
- Optimal Streaming Algorithms for Multi-Armed Bandits [28.579280943038555]
We study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least $1-delta$ probability.
We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within $O(log Delta-1)$ passes.
arXiv Detail & Related papers (2024-10-23T12:54:04Z) - Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Near-Optimal Non-Convex Stochastic Optimization under Generalized
Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point.
However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications.
In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z) - Fast, Sample-Efficient, Affine-Invariant Private Mean and Covariance
Estimation for Subgaussian Distributions [8.40077201352607]
We present a fast, differentially private algorithm for high-dimensional covariance-aware mean estimation.
Our algorithm produces $tildemu$ such that $|mu|_Sigma leq alpha$ as long as $n gtrsim tfrac d alpha2 + tfracd sqrtlog 1/deltaalpha varepsilon+fracdlog 1/deltavarepsilon$.
arXiv Detail & Related papers (2023-01-28T16:57:46Z) - On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards.
We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
arXiv Detail & Related papers (2022-02-13T10:54:52Z) - Faster Rates of Differentially Private Stochastic Convex Optimization [7.93728520583825]
We study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $theta>1$.
In the second part, we focus on a special case where the population risk function is strongly convex.
arXiv Detail & Related papers (2021-07-31T22:23:39Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
We prove that for every decision tree, the absolute values of the Fourier coefficients of a given order $ellsqrtbinomdell (1+log n)ell-1,$ sum to at most $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, and $c>0$ is an absolute constant.
arXiv Detail & Related papers (2020-08-24T06:50:57Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.