PAC Best Arm Identification Under a Deadline
- URL: http://arxiv.org/abs/2106.03221v2
- Date: Tue, 8 Jun 2021 01:35:09 GMT
- Title: PAC Best Arm Identification Under a Deadline
- Authors: Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica, Michael I.
Jordan, Ken Goldberg, Joseph E. Gonzalez
- Abstract summary: We study $(epsilon, delta)$-PAC best arm identification, where a decision-maker must identify an optimal arm with probability at least $1 - delta$, while minimizing the number of arm pulls (samples)
In this work, the decision-maker is given a deadline of $T$ rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them.
We propose Elastic Batch Racing (EBR), a novel algorithm for this setting and bound its sample complexity, showing that EBR is optimal with respect to both
- Score: 101.10352416022559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study $(\epsilon, \delta)$-PAC best arm identification, where a
decision-maker must identify an $\epsilon$-optimal arm with probability at
least $1 - \delta$, while minimizing the number of arm pulls (samples). Most of
the work on this topic is in the sequential setting, where there is no
constraint on the time taken to identify such an arm; this allows the
decision-maker to pull one arm at a time. In this work, the decision-maker is
given a deadline of $T$ rounds, where, on each round, it can adaptively choose
which arms to pull and how many times to pull them; this distinguishes the
number of decisions made (i.e., time or number of rounds) from the number of
samples acquired (cost). Such situations occur in clinical trials, where one
may need to identify a promising treatment under a deadline while minimizing
the number of test subjects, or in simulation-based studies run on the cloud,
where we can elastically scale up or down the number of virtual machines to
conduct as many experiments as we wish, but need to pay for the resource-time
used. As the decision-maker can only make $T$ decisions, she may need to pull
some arms excessively relative to a sequential algorithm in order to perform
well on all possible problems. We formalize this added difficulty with two
hardness results that indicate that unlike sequential settings, the ability to
adapt to the problem difficulty is constrained by the finite deadline. We
propose Elastic Batch Racing (EBR), a novel algorithm for this setting and
bound its sample complexity, showing that EBR is optimal with respect to both
hardness results. We present simulations evaluating EBR in this setting, where
it outperforms baselines by several orders of magnitude.
Related papers
- Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs.
In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time.
We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z) - Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget.
The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$.
We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Selective Sampling for Online Best-arm Identification [19.767267982167578]
Given a set of potential options, a learner aims to compute with probability greater than $1-delta$.
The main results of this work precisely characterize this trade-off between labeled samples and stopping time.
Our framework is general enough to capture binary classification improving upon previous works.
arXiv Detail & Related papers (2021-10-28T03:02:08Z) - Continuous Time Bandits With Sampling Costs [17.412117389855222]
We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample.
There is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency.
The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm.
arXiv Detail & Related papers (2021-07-12T10:00:35Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.