Selective Sampling for Online Best-arm Identification
- URL: http://arxiv.org/abs/2110.14864v1
- Date: Thu, 28 Oct 2021 03:02:08 GMT
- Title: Selective Sampling for Online Best-arm Identification
- Authors: Romain Camilleri, Zhihan Xiong, Maryam Fazel, Lalit Jain, Kevin
Jamieson
- Abstract summary: Given a set of potential options, a learner aims to compute with probability greater than $1-delta$.
The main results of this work precisely characterize this trade-off between labeled samples and stopping time.
Our framework is general enough to capture binary classification improving upon previous works.
- Score: 19.767267982167578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work considers the problem of selective-sampling for best-arm
identification. Given a set of potential options
$\mathcal{Z}\subset\mathbb{R}^d$, a learner aims to compute with probability
greater than $1-\delta$, $\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}$
where $\theta_{\ast}$ is unknown. At each time step, a potential measurement
$x_t\in \mathcal{X}\subset\mathbb{R}^d$ is drawn IID and the learner can either
choose to take the measurement, in which case they observe a noisy measurement
of $x^{\top}\theta_{\ast}$, or to abstain from taking the measurement and wait
for a potentially more informative point to arrive in the stream. Hence the
learner faces a fundamental trade-off between the number of labeled samples
they take and when they have collected enough evidence to declare the best arm
and stop sampling. The main results of this work precisely characterize this
trade-off between labeled samples and stopping time and provide an algorithm
that nearly-optimally achieves the minimal label complexity given a desired
stopping time. In addition, we show that the optimal decision rule has a simple
geometric form based on deciding whether a point is in an ellipse or not.
Finally, our framework is general enough to capture binary classification
improving upon previous works.
Related papers
- A/B Testing and Best-arm Identification for Linear Bandits with
Robustness to Non-stationarity [28.068960555415014]
We investigate the fixed-budget best-arm identification problem for linear bandits in a potentially non-stationary environment.
An algorithm will aim to correctly identify the best arm $x* := argmax_xinmathcalXxtopsum_t=1Ttheta_t$ with probability as high as possible.
arXiv Detail & Related papers (2023-07-27T19:03:36Z) - On the Sample Complexity of Representation Learning in Multi-task
Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems.
Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor)
We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards.
We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
arXiv Detail & Related papers (2022-02-13T10:54:52Z) - PAC Best Arm Identification Under a Deadline [101.10352416022559]
We study $(epsilon, delta)$-PAC best arm identification, where a decision-maker must identify an optimal arm with probability at least $1 - delta$, while minimizing the number of arm pulls (samples)
In this work, the decision-maker is given a deadline of $T$ rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them.
We propose Elastic Batch Racing (EBR), a novel algorithm for this setting and bound its sample complexity, showing that EBR is optimal with respect to both
arXiv Detail & Related papers (2021-06-06T19:48:32Z) - Pure Exploration with Structured Preference Feedback [25.894827160719526]
We consider the problem of pure exploration with subset-wise preference feedback, which contains $N$ arms with features.
We present two algorithms that guarantee the detection of the best-arm in $tildeO (fracd2K Delta2)$ samples with probability at least $1.
arXiv Detail & Related papers (2021-04-12T08:57:29Z) - Explicit Best Arm Identification in Linear Bandits Using No-Regret
Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits.
We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.