Related papers: Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks

Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks

URL: http://arxiv.org/abs/2501.17882v1
Date: Tue, 21 Jan 2025 08:51:23 GMT
Title: Heterogeneous Multi-Player Multi-Armed Bandits Robust To Adversarial Attacks
Authors: Akshayaa Magesh, Venugopal V. Veeravalli,
Abstract summary: We consider a multi-player bandit setting in the presence of adversaries that attempt to negatively affect the rewards received by the players in the system.<n>In the event of a collision (more than one player choosing the same arm), all the colliding users receive zero rewards.<n>The adversaries use collisions to affect the rewards received by the players, i.e., if an adversary attacks an arm, any player choosing that arm will receive zero reward.
Score: 19.184883255588126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider a multi-player multi-armed bandit setting in the presence of adversaries that attempt to negatively affect the rewards received by the players in the system. The reward distributions for any given arm are heterogeneous across the players. In the event of a collision (more than one player choosing the same arm), all the colliding users receive zero rewards. The adversaries use collisions to affect the rewards received by the players, i.e., if an adversary attacks an arm, any player choosing that arm will receive zero reward. At any time step, the adversaries may attack more than one arm. It is assumed that the players in the system do not deviate from a pre-determined policy used by all the players, and that the probability that none of the arms face adversarial attacks is strictly positive at every time step. In order to combat the adversarial attacks, the players are allowed to communicate using a single bit for $O(\log T)$ time units, where $T$ is the time horizon, and each player can only observe their own actions and rewards at all time steps. We propose a {policy that is used by all the players, which} achieves near order optimal regret of order $O(\log^{1+\delta}T + W)$, where $W$ is total number of time units for which there was an adversarial attack on at least one arm.

Related papers

Corrupted Learning Dynamics in Games [62.73758165845971]
An equilibrium can be computed at a fast rate of $O(log T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL) We present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm.
arXiv Detail & Related papers (2024-12-10T02:23:44Z)
Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z)
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile. We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z)
Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation [50.469872635246176]
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards.
arXiv Detail & Related papers (2023-11-27T09:19:01Z)
Optimal Cooperative Multiplayer Learning Bandits with Noisy Rewards and No Communication [0.0]
We consider a cooperative multiplayer bandit learning problem where the players are only allowed to agree on a strategy beforehand. In this problem, each player simultaneously selects an action. We show that this algorithm can achieve logarithmic $O(fraclog TDelta_bma)$ (gap-dependent) regret as well as $O(sqrtTlog T)$ (gap-independent) regret.
arXiv Detail & Related papers (2023-11-10T17:55:44Z)
Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets [76.20705291443208]
We view adversarial attacks as a bargaining game in which different players negotiate to reach an agreement on a joint direction of parameter updating. We design a novel framework that adjusts the budgets of different adversaries to avoid any player dominance. Experiments on standard benchmarks show that employing the proposed framework to the existing approaches significantly advances multi-target robustness.
arXiv Detail & Related papers (2023-06-27T14:02:10Z)
Competing for Shareable Arms in Multi-Player Multi-Armed Bandits [29.08799537067425]
We study a novel multi-player multi-armed bandit (MPMAB) setting where players are selfish and aim to maximize their own rewards. We propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. We establish that no single selfish player can significantly increase their rewards through deviation, nor can they detrimentally affect other players' rewards without incurring substantial losses for themselves.
arXiv Detail & Related papers (2023-05-30T15:59:56Z)
Multi-Player Bandits Robust to Adversarial Collisions [31.349615523580518]
Multi-Player Multi-Armed Bandits has been extensively studied in recent years. In this paper, we consider the presence of malicious players (or attackers) who obstruct the cooperative players from maximizing their rewards, by deliberately colliding with them. We provide the first decentralized and robust algorithm RESYNC for defenders whose performance deteriorates gracefully as the number of collisions $C$ from the attackers increases.
arXiv Detail & Related papers (2022-11-15T00:43:26Z)
Almost Cost-Free Communication in Federated Best Arm Identification [76.12303738941254]
We study the problem of best arm identification in a federated learning multi-armed bandit setup with a central server and multiple clients. We propose a novel algorithm sc FedElim that is based on successive elimination and communicates only in exponential time steps.
arXiv Detail & Related papers (2022-08-19T08:37:09Z)
The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication [10.446001329147112]
We study the multi-player multi-armed bandit problem. In this problem, $m$ players cooperate to maximize their total reward from $K > m$ arms. We ask whether it is possible to obtain optimal instance-dependent regret $tildeO (1/Delta)$ where $Delta$ is the gap between the $m$-th and $m+1$-st best arms.
arXiv Detail & Related papers (2022-02-19T18:19:36Z)
Bandit Learning in Decentralized Matching Markets [82.39061186055775]
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. This model extends the standard multi-armed bandit framework to a decentralized multiple player setting with competition. We show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
arXiv Detail & Related papers (2020-12-14T08:58:07Z)
Multiplayer Bandit Learning, from Competition to Cooperation [3.7801191959442053]
We study the effects of competition and cooperation on the tradeoff between exploration and exploitation. The model is related to the economics literature on strategic experimentation, where usually players observe each other's rewards.
arXiv Detail & Related papers (2019-08-03T08:20:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.