Related papers: Multi-Player Bandits Robust to Adversarial Collisions

Multi-Player Bandits Robust to Adversarial Collisions

URL: http://arxiv.org/abs/2211.07817v1
Date: Tue, 15 Nov 2022 00:43:26 GMT
Title: Multi-Player Bandits Robust to Adversarial Collisions
Authors: Shivakumar Mahesh, Anshuka Rangi, Haifeng Xu and Long Tran-Thanh
Abstract summary: Multi-Player Multi-Armed Bandits has been extensively studied in recent years. In this paper, we consider the presence of malicious players (or attackers) who obstruct the cooperative players from maximizing their rewards, by deliberately colliding with them. We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as the number of collisions $C$ from the attackers increases.
Score: 31.349615523580518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Motivated by cognitive radios, stochastic Multi-Player Multi-Armed Bandits has been extensively studied in recent years. In this setting, each player pulls an arm, and receives a reward corresponding to the arm if there is no collision, namely the arm was selected by one single player. Otherwise, the player receives no reward if collision occurs. In this paper, we consider the presence of malicious players (or attackers) who obstruct the cooperative players (or defenders) from maximizing their rewards, by deliberately colliding with them. We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as $\tilde{O}(C)$ as the number of collisions $C$ from the attackers increases. We show that this algorithm is order-optimal by proving a lower bound which scales as $\Omega(C)$. This algorithm is agnostic to the algorithm used by the attackers and agnostic to the number of collisions $C$ faced from attackers.

Related papers

Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks [19.184883255588126]
We consider a multi-player bandit setting in the presence of adversaries that attempt to negatively affect the rewards received by the players in the system. In the event of a collision (more than one player choosing the same arm), all the colliding users receive zero rewards. The adversaries use collisions to affect the rewards received by the players, i.e., if an adversary attacks an arm, any player choosing that arm will receive zero reward.
arXiv Detail & Related papers (2025-01-21T08:51:23Z)
Corrupted Learning Dynamics in Games [62.73758165845971]
An equilibrium can be computed at a fast rate of $O(log T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL) We present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm.
arXiv Detail & Related papers (2024-12-10T02:23:44Z)
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile. We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning [105.7510838453122]
We examine online safe multi-agent reinforcement learning using constrained Markov games. We develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step.
arXiv Detail & Related papers (2023-05-31T22:09:24Z)
Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z)
The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication [10.446001329147112]
We study the multi-player multi-armed bandit problem. In this problem, $m$ players cooperate to maximize their total reward from $K > m$ arms. We ask whether it is possible to obtain optimal instance-dependent regret $tildeO (1/Delta)$ where $Delta$ is the gap between the $m$-th and $m+1$-st best arms.
arXiv Detail & Related papers (2022-02-19T18:19:36Z)
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z)
Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models. In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms. CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z)
Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions [4.974932889340056]
We consider the cooperative multi-player version of the multi-armed bandit problem. We show that these properties are achievable for any number of players and arms.
arXiv Detail & Related papers (2020-11-08T03:14:19Z)
Multitask Bandit Learning Through Heterogeneous Feedback Aggregation [35.923544685900055]
We formulate the problem as the $epsilon$-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms. We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players.
arXiv Detail & Related papers (2020-10-29T07:13:28Z)
Selfish Robustness and Equilibria in Multi-Player Bandits [25.67398941667429]
In a game, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward has been mostly considered, to malicious players is a crucial and challenging concern. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare.
arXiv Detail & Related papers (2020-02-04T09:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.