Censored Semi-Bandits for Resource Allocation
- URL: http://arxiv.org/abs/2104.05781v1
- Date: Mon, 12 Apr 2021 19:15:32 GMT
- Title: Censored Semi-Bandits for Resource Allocation
- Authors: Arun Verma, Manjesh K. Hanawal, Arun Rajkumar, Raman Sankaran
- Abstract summary: We consider the problem of sequentially allocating resources in a censored semi-bandits setup.
The loss depends on two hidden parameters, one specific to the arm but independent of the resource allocation, and the other depends on the allocated resource.
We derive optimal algorithms for our problem setting using known algorithms for MP-MAB and Combinatorial Semi-Bandits.
- Score: 12.450488894967773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of sequentially allocating resources in a censored
semi-bandits setup, where the learner allocates resources at each step to the
arms and observes loss. The loss depends on two hidden parameters, one specific
to the arm but independent of the resource allocation, and the other depends on
the allocated resource. More specifically, the loss equals zero for an arm if
the resource allocated to it exceeds a constant (but unknown) arm dependent
threshold. The goal is to learn a resource allocation that minimizes the
expected loss. The problem is challenging because the loss distribution and
threshold value of each arm are unknown. We study this setting by establishing
its `equivalence' to Multiple-Play Multi-Armed Bandits (MP-MAB) and
Combinatorial Semi-Bandits. Exploiting these equivalences, we derive optimal
algorithms for our problem setting using known algorithms for MP-MAB and
Combinatorial Semi-Bandits. The experiments on synthetically generated data
validate the performance guarantees of the proposed algorithms.
Related papers
- Best Arm Identification with Resource Constraints [6.236743421605786]
We study the Best Arm Identification with Resource Constraints (BAIwRC) problem.
The agent aims to identify the best arm under resource constraints, where resources are consumed for each arm pull.
We design and analyze the Successive Halving with Resource Rationing algorithm (SH-RR)
arXiv Detail & Related papers (2024-02-29T12:17:54Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget [4.226118870861363]
We consider a constrained, pure exploration, multi-armed bandit formulation under a fixed budget.
We propose an algorithm called textscConstrained-SR based on the Successive Rejects framework.
We show that the associated decay rate is nearly optimal relative to an information theoretic lower bound in certain special cases.
arXiv Detail & Related papers (2022-11-27T08:58:16Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Combinatorial Multi-armed Bandits for Resource Allocation [23.869080705146395]
Motivating examples include allocating limited computing time or wireless spectrum bands to multiple users.
Decision maker should learn the value of the resources allocated for each user from feedback on each user's received reward.
arXiv Detail & Related papers (2021-05-10T13:55:30Z) - Multi-Armed Bandits with Censored Consumption of Resources [9.803834317538747]
We consider a resource-aware variant of the classical multi-armed bandit problem.
In each round, the learner selects an arm and determines a resource limit.
It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit.
arXiv Detail & Related papers (2020-11-02T08:27:38Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z) - Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed
Bandits [4.760079434948198]
We show that specialized algorithms that exploit such parametric information are prone to inconsistent learning performance when the parameter is misspecified.
Our key contributions are: (i) We establish fundamental performance limits of statistically robust MAB algorithms under the fixed-budget pure exploration setting, and (ii) We propose two classes of algorithms that are twofoldly near-optimal.
arXiv Detail & Related papers (2020-08-28T13:43:12Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.