Related papers: PAC Best Arm Identification Under a Deadline

PAC Best Arm Identification Under a Deadline

URL: http://arxiv.org/abs/2106.03221v2
Date: Tue, 8 Jun 2021 01:35:09 GMT
Title: PAC Best Arm Identification Under a Deadline
Authors: Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica, Michael I. Jordan, Ken Goldberg, Joseph E. Gonzalez
Abstract summary: We study $(epsilon, delta)$-PAC best arm identification, where a decision-maker must identify an optimal arm with probability at least $1 - delta$, while minimizing the number of arm pulls (samples) In this work, the decision-maker is given a deadline of $T$ rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them. We propose Elastic Batch Racing (EBR), a novel algorithm for this setting and bound its sample complexity, showing that EBR is optimal with respect to both
Score: 101.10352416022559
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study $(\epsilon, \delta)$-PAC best arm identification, where a decision-maker must identify an $\epsilon$-optimal arm with probability at least $1 - \delta$, while minimizing the number of arm pulls (samples). Most of the work on this topic is in the sequential setting, where there is no constraint on the time taken to identify such an arm; this allows the decision-maker to pull one arm at a time. In this work, the decision-maker is given a deadline of $T$ rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them; this distinguishes the number of decisions made (i.e., time or number of rounds) from the number of samples acquired (cost). Such situations occur in clinical trials, where one may need to identify a promising treatment under a deadline while minimizing the number of test subjects, or in simulation-based studies run on the cloud, where we can elastically scale up or down the number of virtual machines to conduct as many experiments as we wish, but need to pay for the resource-time used. As the decision-maker can only make $T$ decisions, she may need to pull some arms excessively relative to a sequential algorithm in order to perform well on all possible problems. We formalize this added difficulty with two hardness results that indicate that unlike sequential settings, the ability to adapt to the problem difficulty is constrained by the finite deadline. We propose Elastic Batch Racing (EBR), a novel algorithm for this setting and bound its sample complexity, showing that EBR is optimal with respect to both hardness results. We present simulations evaluating EBR in this setting, where it outperforms baselines by several orders of magnitude.

Related papers

Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs. In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time. We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z)
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$. We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Optimal Batched Best Arm Identification [29.635422073014638]
We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. We propose the three-batch best arm identification (Tri-BBAI), which is the first batched algorithm that achieves the optimal sample complexity. We further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first achieves that the near-optimal sample and batch complexity in the non-asymptotic setting.
arXiv Detail & Related papers (2023-10-21T22:55:50Z)
Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment. A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability. We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z)
Selective Sampling for Online Best-arm Identification [19.767267982167578]
Given a set of potential options, a learner aims to compute with probability greater than $1-delta$. The main results of this work precisely characterize this trade-off between labeled samples and stopping time. Our framework is general enough to capture binary classification improving upon previous works.
arXiv Detail & Related papers (2021-10-28T03:02:08Z)
Continuous Time Bandits With Sampling Costs [17.412117389855222]
We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample. There is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency. The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm.
arXiv Detail & Related papers (2021-07-12T10:00:35Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls. We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)
Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.