Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism
- URL: http://arxiv.org/abs/2011.00330v2
- Date: Sat, 5 Jun 2021 19:25:31 GMT
- Title: Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism
- Authors: Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica, Michael I.
Jordan, Ken Goldberg, Joseph E. Gonzalez
- Abstract summary: We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
- Score: 107.48538091418412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study exploration in stochastic multi-armed bandits when we have access to
a divisible resource that can be allocated in varying amounts to arm pulls. We
focus in particular on the allocation of distributed computing resources, where
we may obtain results faster by allocating more resources per pull, but might
have reduced throughput due to nonlinear scaling. For example, in
simulation-based scientific studies, an expensive simulation can be sped up by
running it on multiple cores. This speed-up however, is partly offset by the
communication among cores, which results in lower throughput than if fewer
cores were allocated per trial to run more trials in parallel. In this paper,
we explore these trade-offs in two settings. First, in a fixed confidence
setting, we need to find the best arm with a given target success probability
as quickly as possible. We propose an algorithm which trades off between
information accumulation and throughput and show that the time taken can be
upper bounded by the solution of a dynamic program whose inputs are the gaps
between the sub-optimal and optimal arms. We also prove a matching hardness
result. Second, we present an algorithm for a fixed deadline setting, where we
are given a time deadline and need to maximize the probability of finding the
best arm. We corroborate our theoretical insights with simulation experiments
that show that the algorithms consistently match or outperform baseline
algorithms on a variety of problem instances.
Related papers
- A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Best Arm Identification in Batched Multi-armed Bandit Problems [3.908867017036043]
Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches.
We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification.
We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods.
arXiv Detail & Related papers (2023-12-21T14:16:38Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or
Bayesian? [0.0]
Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms.
We propose a reconfigurable and intelligent MAB (RI-MAB) framework.
arXiv Detail & Related papers (2021-06-05T10:07:31Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - On Regret with Multiple Best Arms [12.315392649501101]
We study a regret problem with the existence of multiple best/near-optimal arms in a bandit setting.
Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem.
arXiv Detail & Related papers (2020-06-26T04:01:46Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.