Related papers: Selfish Robustness and Equilibria in Multi-Player Bandits

Selfish Robustness and Equilibria in Multi-Player Bandits

URL: http://arxiv.org/abs/2002.01197v2
Date: Fri, 19 Jun 2020 08:02:43 GMT
Title: Selfish Robustness and Equilibria in Multi-Player Bandits
Authors: Etienne Boursier and Vianney Perchet
Abstract summary: In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
Score: 25.67398941667429
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by cognitive radios, stochastic multi-player multi-armed bandits gained a lot of interest recently. In this class of problems, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward (obediently following some fixed protocol) has been mostly considered, robustness to malicious players is a crucial and challenging concern. Existing approaches consider only the case of adversarial jammers whose objective is to blindly minimize the collective reward. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare. We provide the first algorithm robust to selfish players (a.k.a. Nash equilibrium) with a logarithmic regret, when the arm performance is observed. When collisions are also observed, Grim Trigger type of strategies enable some implicit communication-based algorithms and we construct robust algorithms in two different settings: the homogeneous (with a regret comparable to the centralized optimal one) and heterogeneous cases (for an adapted and relevant notion of regret). We also provide impossibility results when only the reward is observed or when arm means vary arbitrarily among players.

Related papers

Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile. We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z)
Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting. We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z)
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Competing for Shareable Arms in Multi-Player Multi-Armed Bandits [29.08799537067425]
We study a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. We propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves.
arXiv Detail & Related papers (2023-05-30T15:59:56Z)
Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z)
Multi-Player Bandits Robust to Adversarial Collisions [31.349615523580518]
Multi-Player Multi-Armed Bandits has been extensively studied in recent years. In this paper, we consider the presence of malicious players (or attackers) who obstruct the cooperative players from maximizing their rewards, by deliberately colliding with them. We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as the number of collisions $C$ from the attackers increases.
arXiv Detail & Related papers (2022-11-15T00:43:26Z)
How Bad is Selfish Driving? Bounding the Inefficiency of Equilibria in Urban Driving Games [64.71476526716668]
We study the (in)efficiency of any equilibrium players might agree to play. We obtain guarantees that refine existing bounds on the Price of Anarchy. Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies.
arXiv Detail & Related papers (2022-10-24T09:32:40Z)
Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications [32.313813562222066]
We study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards. Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards. We propose an MMAB with shareable resources as an extension to the collision and non-collision settings.
arXiv Detail & Related papers (2022-04-28T13:46:59Z)
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z)
Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition. We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z)
Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms. We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.