Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms:
Learning Algorithms & Applications
- URL: http://arxiv.org/abs/2204.13502v1
- Date: Thu, 28 Apr 2022 13:46:59 GMT
- Title: Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms:
Learning Algorithms & Applications
- Authors: Xuchuang Wang, Hong Xie, John C.S. Lui
- Abstract summary: We study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards.
Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards.
We propose an MMAB with shareable resources as an extension to the collision and non-collision settings.
- Score: 32.313813562222066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-player multi-armed bandits (MMAB) study how decentralized players
cooperatively play the same multi-armed bandit so as to maximize their total
cumulative rewards. Existing MMAB models mostly assume when more than one
player pulls the same arm, they either have a collision and obtain zero
rewards, or have no collision and gain independent rewards, both of which are
usually too restrictive in practical scenarios. In this paper, we propose an
MMAB with shareable resources as an extension to the collision and
non-collision settings. Each shareable arm has finite shareable resources and a
"per-load" reward random variable, both of which are unknown to players. The
reward from a shareable arm is equal to the "per-load" reward multiplied by the
minimum between the number of players pulling the arm and the arm's maximal
shareable resources. We consider two types of feedback: sharing demand
information (SDI) and sharing demand awareness (SDA), each of which provides
different signals of resource sharing. We design the DPE-SDI and SIC-SDA
algorithms to address the shareable arm problem under these two cases of
feedback respectively and prove that both algorithms have logarithmic regrets
that are tight in the number of rounds. We conduct simulations to validate both
algorithms' performance and show their utilities in wireless networking and
edge computing.
Related papers
- Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits.
We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Bandits Meet Mechanism Design to Combat Clickbait in Online
Recommendation [50.469872635246176]
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit.
This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards.
arXiv Detail & Related papers (2023-11-27T09:19:01Z) - Competing for Shareable Arms in Multi-Player Multi-Armed Bandits [29.08799537067425]
We study a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards.
We propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium.
We establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves.
arXiv Detail & Related papers (2023-05-30T15:59:56Z) - Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems.
This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z) - Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms [32.313813562222066]
We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arm setting.
Each shareable arm has a finite reward capacity and a ''per-load'' reward distribution.
We propose an online learning algorithm to address the problem and prove its regret upper bound.
arXiv Detail & Related papers (2022-06-17T13:47:27Z) - An Instance-Dependent Analysis for the Cooperative Multi-Player
Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits.
First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps.
Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z) - Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms.
We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z) - Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time.
While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern.
We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.