Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits
- URL: http://arxiv.org/abs/2212.06279v1
- Date: Mon, 12 Dec 2022 23:26:02 GMT
- Title: Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits
- Authors: Guojun Xiong, Jian Li
- Abstract summary: Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems.
This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
- Score: 6.732901486505047
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Multi-player multi-armed bandit is an increasingly relevant decision-making
problem, motivated by applications to cognitive radio systems. Most research
for this problem focuses exclusively on the settings that players have
\textit{full access} to all arms and receive no reward when pulling the same
arm. Hence all players solve the same bandit problem with the goal of
maximizing their cumulative reward. However, these settings neglect several
important factors in many real-world applications, where players have
\textit{limited access} to \textit{a dynamic local subset of arms} (i.e., an
arm could sometimes be ``walking'' and not accessible to the player). To this
end, this paper proposes a \textit{multi-player multi-armed walking bandits}
model, aiming to address aforementioned modeling issues. The goal now is to
maximize the reward, however, players can only pull arms from the local subset
and only collect a full reward if no other players pull the same arm. We adopt
Upper Confidence Bound (UCB) to deal with the exploration-exploitation tradeoff
and employ distributed optimization techniques to properly handle collisions.
By carefully integrating these two techniques, we propose a decentralized
algorithm with near-optimal guarantee on the regret, and can be easily
implemented to obtain competitive empirical performance.
Related papers
- Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z) - Provably Efficient Generalized Lagrangian Policy Optimization for Safe
Multi-Agent Reinforcement Learning [105.7510838453122]
We examine online safe multi-agent reinforcement learning using constrained Markov games.
We develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem.
Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step.
arXiv Detail & Related papers (2023-05-31T22:09:24Z) - Competing for Shareable Arms in Multi-Player Multi-Armed Bandits [29.08799537067425]
We study a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards.
We propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium.
We establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves.
arXiv Detail & Related papers (2023-05-30T15:59:56Z) - Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms:
Learning Algorithms & Applications [32.313813562222066]
We study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards.
Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards.
We propose an MMAB with shareable resources as an extension to the collision and non-collision settings.
arXiv Detail & Related papers (2022-04-28T13:46:59Z) - The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player
Multi-Armed Bandits with no Communication [10.446001329147112]
We study the multi-player multi-armed bandit problem.
In this problem, $m$ players cooperate to maximize their total reward from $K > m$ arms.
We ask whether it is possible to obtain optimal instance-dependent regret $tildeO (1/Delta)$ where $Delta$ is the gap between the $m$-th and $m+1$-st best arms.
arXiv Detail & Related papers (2022-02-19T18:19:36Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience.
This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition.
We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z) - Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time.
While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern.
We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.