Related papers: Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

URL: http://arxiv.org/abs/2305.19158v2
Date: Fri, 4 Aug 2023 06:29:20 GMT
Title: Competing for Shareable Arms in Multi-Player Multi-Armed Bandits
Authors: Renzhe Xu, Haotian Wang, Xingxuan Zhang, Bo Li, Peng Cui
Abstract summary: We study a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. We propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves.
Score: 29.08799537067425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Competitions for shareable and limited resources have long been studied with strategic agents. In reality, agents often have to learn and maximize the rewards of the resources at the same time. To design an individualized competing policy, we model the competition between agents in a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. In addition, when several players pull the same arm, we assume that these players averagely share the arms' rewards by expectation. Under this setting, we first analyze the Nash equilibrium when arms' rewards are known. Subsequently, we propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We theoretically demonstrate that SMAA could achieve a good regret guarantee for each player when all players follow the algorithm. Additionally, we establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves. We finally validate the effectiveness of the method in extensive synthetic experiments.

Related papers

Competitive Multi-armed Bandit Games for Resource Sharing [17.986928810925686]
In modern resource-sharing systems, multiple agents access limited resources with unknown conditions to perform tasks. In this paper, we study a new N-player K-arm competitive MAB game, where non-myopic players (agents) compete with each other to form diverse private estimations of unknown arms.
arXiv Detail & Related papers (2025-03-26T20:35:18Z)
Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z)
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile. We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z)
Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [57.627352949446625]
We consider a variant of the multi-armed bandit problem. Specifically, the arms are strategic agents who can improve their rewards or absorb them. We identify a class of MAB algorithms which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation [50.469872635246176]
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards.
arXiv Detail & Related papers (2023-11-27T09:19:01Z)
Optimal Cooperative Multiplayer Learning Bandits with Noisy Rewards and No Communication [0.0]
We consider a cooperative multiplayer bandit learning problem where the players are only allowed to agree on a strategy beforehand. In this problem, each player simultaneously selects an action. We show that this algorithm can achieve logarithmic $O(fraclog TDelta_bma)$ (gap-dependent) regret as well as $O(sqrtTlog T)$ (gap-independent) regret.
arXiv Detail & Related papers (2023-11-10T17:55:44Z)
ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets. We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z)
Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z)
Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications [32.313813562222066]
We study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards. Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards. We propose an MMAB with shareable resources as an extension to the collision and non-collision settings.
arXiv Detail & Related papers (2022-04-28T13:46:59Z)
Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms. We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z)
Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge. CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z)
Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.