Selfish Robustness and Equilibria in Multi-Player Bandits
- URL: http://arxiv.org/abs/2002.01197v2
- Date: Fri, 19 Jun 2020 08:02:43 GMT
- Title: Selfish Robustness and Equilibria in Multi-Player Bandits
- Authors: Etienne Boursier and Vianney Perchet
- Abstract summary: In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time.
While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern.
We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
- Score: 25.67398941667429
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by cognitive radios, stochastic multi-player multi-armed bandits
gained a lot of interest recently. In this class of problems, several players
simultaneously pull arms and encounter a collision - with 0 reward - if some of
them pull the same arm at the same time. While the cooperative case where
players maximize the collective reward (obediently following some fixed
protocol) has been mostly considered, robustness to malicious players is a
crucial and challenging concern. Existing approaches consider only the case of
adversarial jammers whose objective is to blindly minimize the collective
reward. We shall consider instead the more natural class of selfish players
whose incentives are to maximize their individual rewards, potentially at the
expense of the social welfare. We provide the first algorithm robust to selfish
players (a.k.a. Nash equilibrium) with a logarithmic regret, when the arm
performance is observed. When collisions are also observed, Grim Trigger type
of strategies enable some implicit communication-based algorithms and we
construct robust algorithms in two different settings: the homogeneous (with a
regret comparable to the centralized optimal one) and heterogeneous cases (for
an adapted and relevant notion of regret). We also provide impossibility
results when only the reward is observed or when arm means vary arbitrarily
among players.
Related papers
- Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Competing for Shareable Arms in Multi-Player Multi-Armed Bandits [29.08799537067425]
We study a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards.
We propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium.
We establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves.
arXiv Detail & Related papers (2023-05-30T15:59:56Z) - Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems.
This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z) - Multi-Player Bandits Robust to Adversarial Collisions [31.349615523580518]
Multi-Player Multi-Armed Bandits has been extensively studied in recent years.
In this paper, we consider the presence of malicious players (or attackers) who obstruct the cooperative players from maximizing their rewards, by deliberately colliding with them.
We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as the number of collisions $C$ from the attackers increases.
arXiv Detail & Related papers (2022-11-15T00:43:26Z) - How Bad is Selfish Driving? Bounding the Inefficiency of Equilibria in
Urban Driving Games [64.71476526716668]
We study the (in)efficiency of any equilibrium players might agree to play.
We obtain guarantees that refine existing bounds on the Price of Anarchy.
Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies.
arXiv Detail & Related papers (2022-10-24T09:32:40Z) - Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms:
Learning Algorithms & Applications [32.313813562222066]
We study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards.
Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards.
We propose an MMAB with shareable resources as an extension to the collision and non-collision settings.
arXiv Detail & Related papers (2022-04-28T13:46:59Z) - An Instance-Dependent Analysis for the Cooperative Multi-Player
Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits.
First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps.
Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z) - Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience.
This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition.
We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z) - Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms.
We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.