Multi-Player Bandits Robust to Adversarial Collisions
- URL: http://arxiv.org/abs/2211.07817v1
- Date: Tue, 15 Nov 2022 00:43:26 GMT
- Title: Multi-Player Bandits Robust to Adversarial Collisions
- Authors: Shivakumar Mahesh, Anshuka Rangi, Haifeng Xu and Long Tran-Thanh
- Abstract summary: Multi-Player Multi-Armed Bandits has been extensively studied in recent years.
In this paper, we consider the presence of malicious players (or attackers) who obstruct the cooperative players from maximizing their rewards, by deliberately colliding with them.
We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as the number of collisions $C$ from the attackers increases.
- Score: 31.349615523580518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by cognitive radios, stochastic Multi-Player Multi-Armed Bandits
has been extensively studied in recent years. In this setting, each player
pulls an arm, and receives a reward corresponding to the arm if there is no
collision, namely the arm was selected by one single player. Otherwise, the
player receives no reward if collision occurs. In this paper, we consider the
presence of malicious players (or attackers) who obstruct the cooperative
players (or defenders) from maximizing their rewards, by deliberately colliding
with them. We provide the first decentralized and robust algorithm RESYNC for
defenders whose performance deteriorates gracefully as $\tilde{O}(C)$ as the
number of collisions $C$ from the attackers increases. We show that this
algorithm is order-optimal by proving a lower bound which scales as
$\Omega(C)$. This algorithm is agnostic to the algorithm used by the attackers
and agnostic to the number of collisions $C$ faced from attackers.
Related papers
- Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Provably Efficient Generalized Lagrangian Policy Optimization for Safe
Multi-Agent Reinforcement Learning [105.7510838453122]
We examine online safe multi-agent reinforcement learning using constrained Markov games.
We develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem.
Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step.
arXiv Detail & Related papers (2023-05-31T22:09:24Z) - Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems.
This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z) - The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player
Multi-Armed Bandits with no Communication [10.446001329147112]
We study the multi-player multi-armed bandit problem.
In this problem, $m$ players cooperate to maximize their total reward from $K > m$ arms.
We ask whether it is possible to obtain optimal instance-dependent regret $tildeO (1/Delta)$ where $Delta$ is the gap between the $m$-th and $m+1$-st best arms.
arXiv Detail & Related papers (2022-02-19T18:19:36Z) - An Instance-Dependent Analysis for the Cooperative Multi-Player
Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits.
First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps.
Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z) - Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models.
In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms.
CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z) - Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal
Regret With Neither Communication Nor Collisions [4.974932889340056]
We consider the cooperative multi-player version of the multi-armed bandit problem.
We show that these properties are achievable for any number of players and arms.
arXiv Detail & Related papers (2020-11-08T03:14:19Z) - Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms.
We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z) - Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time.
While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern.
We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.