Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms
- URL: http://arxiv.org/abs/2206.08776v1
- Date: Fri, 17 Jun 2022 13:47:27 GMT
- Title: Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms
- Authors: Xuchuang Wang, Hong Xie, John C.S. Lui
- Abstract summary: We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting.
Each shareable arm has a finite reward capacity and a ''per-load'' reward distribution.
We propose an online learning algorithm to address the problem and prove its regret upper bound.
- Score: 32.313813562222066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a
shareable arm setting, in which several plays can share the same arm.
Furthermore, each shareable arm has a finite reward capacity and a ''per-load''
reward distribution, both of which are unknown to the learner. The reward from
a shareable arm is load-dependent, which is the "per-load" reward multiplying
either the number of plays pulling the arm, or its reward capacity when the
number of plays exceeds the capacity limit. When the "per-load" reward follows
a Gaussian distribution, we prove a sample complexity lower bound of learning
the capacity from load-dependent rewards and also a regret lower bound of this
new MP-MAB problem. We devise a capacity estimator whose sample complexity
upper bound matches the lower bound in terms of reward means and capacities. We
also propose an online learning algorithm to address the problem and prove its
regret upper bound. This regret upper bound's first term is the same as regret
lower bound's, and its second and third terms also evidently correspond to
lower bound's. Extensive experiments validate our algorithm's performance and
also its gain in 5G & 4G base station selection.
Related papers
- Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits.
We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Max-Quantile Grouped Infinite-Arm Bandits [39.7239093424124]
bandit problem in which there are a number of groups each consisting of infinitely many arms.
We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm.
We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter.
arXiv Detail & Related papers (2022-10-04T00:46:49Z) - Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms:
Learning Algorithms & Applications [32.313813562222066]
We study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards.
Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards.
We propose an MMAB with shareable resources as an extension to the collision and non-collision settings.
arXiv Detail & Related papers (2022-04-28T13:46:59Z) - Bandit problems with fidelity rewards [7.154621689269006]
The fidelity bandits problem is a variant of the $K$-armed bandit problem in which the reward of each arm is augmented by a fidelity reward depending on how 'loyal' the player has been to that arm in the past.
In the loyalty-points model the amount of extra reward depends on the number of times the arm has previously been played.
In the subscription model the additional reward depends on the current number of consecutive draws of the arm.
arXiv Detail & Related papers (2021-11-25T11:09:43Z) - Multi-Armed Bandits with Censored Consumption of Resources [9.803834317538747]
We consider a resource-aware variant of the classical multi-armed bandit problem.
In each round, the learner selects an arm and determines a resource limit.
It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit.
arXiv Detail & Related papers (2020-11-02T08:27:38Z) - Tight Lower Bounds for Combinatorial Multi-Armed Bandits [72.56064196252498]
The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round.
We show that the recently proposed Gini-weighted smoothness parameter determines the lower bounds for monotone reward functions.
arXiv Detail & Related papers (2020-02-13T08:53:43Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.