Combinatorial Multi-armed Bandits for Resource Allocation
- URL: http://arxiv.org/abs/2105.04373v1
- Date: Mon, 10 May 2021 13:55:30 GMT
- Title: Combinatorial Multi-armed Bandits for Resource Allocation
- Authors: Jinhang Zuo, Carlee Joe-Wong
- Abstract summary: Motivating examples include allocating limited computing time or wireless spectrum bands to multiple users.
Decision maker should learn the value of the resources allocated for each user from feedback on each user's received reward.
- Score: 23.869080705146395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the sequential resource allocation problem where a decision maker
repeatedly allocates budgets between resources. Motivating examples include
allocating limited computing time or wireless spectrum bands to multiple users
(i.e., resources). At each timestep, the decision maker should distribute its
available budgets among different resources to maximize the expected reward, or
equivalently to minimize the cumulative regret. In doing so, the decision maker
should learn the value of the resources allocated for each user from feedback
on each user's received reward. For example, users may send messages of
different urgency over wireless spectrum bands; the reward generated by
allocating spectrum to a user then depends on the message's urgency. We assume
each user's reward follows a random process that is initially unknown. We
design combinatorial multi-armed bandit algorithms to solve this problem with
discrete or continuous budgets. We prove the proposed algorithms achieve
logarithmic regrets under semi-bandit feedback.
Related papers
- Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs.
In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time.
We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z) - Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits.
We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z) - Adversarial Bandits with Multi-User Delayed Feedback: Theory and
Application [17.64363983613468]
We formulate an adversarial MAB problem with multi-user delayed feedback and design a modified EXP3 algorithm MUD-EXP3.
In this paper, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution.
arXiv Detail & Related papers (2023-10-17T12:08:15Z) - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When
Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB)
This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull.
We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions [54.25616645675032]
We study the Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm.
We consider two settings: the reward-dependent delay setting, where realized delays may depend on the rewards, and the reward-independent delay setting.
Our main contribution is algorithms that achieve near-optimal regret in each of the settings.
arXiv Detail & Related papers (2021-06-04T12:26:06Z) - Censored Semi-Bandits for Resource Allocation [12.450488894967773]
We consider the problem of sequentially allocating resources in a censored semi-bandits setup.
The loss depends on two hidden parameters, one specific to the arm but independent of the resource allocation, and the other depends on the allocated resource.
We derive optimal algorithms for our problem setting using known algorithms for MP-MAB and Combinatorial Semi-Bandits.
arXiv Detail & Related papers (2021-04-12T19:15:32Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Dynamic Spectrum Access using Stochastic Multi-User Bandits [23.129398025477204]
The proposed algorithm consists of an estimation phase and an allocation phase.
It is shown that if every user adopts the algorithm, the system wide regret is order-optimal of order $O(log T)$ over a time-horizon of duration.
The algorithm is extended to the dynamic case where the number of users in the system evolves over time, and is shown to lead to sub-linear regret.
arXiv Detail & Related papers (2021-01-12T10:29:57Z) - Multi-Armed Bandits with Censored Consumption of Resources [9.803834317538747]
We consider a resource-aware variant of the classical multi-armed bandit problem.
In each round, the learner selects an arm and determines a resource limit.
It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit.
arXiv Detail & Related papers (2020-11-02T08:27:38Z) - Regret in Online Recommendation Systems [73.58127515175127]
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time.
In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items.
The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.
arXiv Detail & Related papers (2020-10-23T12:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.