Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience
- URL: http://arxiv.org/abs/2511.10344v1
- Date: Fri, 14 Nov 2025 01:46:14 GMT
- Title: Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience
- Authors: Zicheng Hu, Yuchen Wang, Cheng Chen,
- Abstract summary: Decentralized cooperative multi-agent multi-armed bandits (DeCMA2B) considers how multiple agents collaborate in a decentralized multi-armed bandit setting.<n>We first study DeCMA2B with adversarial corruption, where an adversary can corrupt reward observations of all agents with a limited corruption budget.<n>We propose a robust algorithm, called DeMABAR, which ensures that each agent's individual regret suffers only an additive term proportional to the corruption budget.
- Score: 18.941567777486867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized cooperative multi-agent multi-armed bandits (DeCMA2B) considers how multiple agents collaborate in a decentralized multi-armed bandit setting. Though this problem has been extensively studied in previous work, most existing methods remain susceptible to various adversarial attacks. In this paper, we first study DeCMA2B with adversarial corruption, where an adversary can corrupt reward observations of all agents with a limited corruption budget. We propose a robust algorithm, called DeMABAR, which ensures that each agent's individual regret suffers only an additive term proportional to the corruption budget. Then we consider a more realistic scenario where the adversary can only attack a small number of agents. Our theoretical analysis shows that the DeMABAR algorithm can also almost completely eliminate the influence of adversarial attacks and is inherently robust in the Byzantine setting, where an unknown fraction of the agents can be Byzantine, i.e., may arbitrarily select arms and communicate wrong information. We also conduct numerical experiments to illustrate the robustness and effectiveness of the proposed method.
Related papers
- Can an Individual Manipulate the Collective Decisions of Multi-Agents? [53.01767232004823]
M-Spoiler is a framework that simulates agent interactions within a multi-agent system to generate adversarial samples.<n>M-Spoiler introduces a stubborn agent that actively aids in optimizing adversarial samples.<n>Our findings confirm the risks posed by the knowledge of an individual agent in multi-agent systems.
arXiv Detail & Related papers (2025-09-20T01:54:20Z) - Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions [6.234292942334148]
We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruptions.
As a side-product, our algorithm also improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios.
arXiv Detail & Related papers (2024-11-12T20:20:26Z) - Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously.<n>We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z) - Adversarial Attacks on Cooperative Multi-agent Bandits [41.79235070291252]
We study adversarial attacks on CMA2B in both homogeneous and heterogeneous settings.
In the homogeneous setting, we propose attack strategies that convince all agents to select a particular target arm $T-o(T)$ times while incurring $o(T)$ attack costs in $T$ rounds.
In the heterogeneous setting, we prove that a target arm attack requires linear attack costs and propose attack strategies that can force a maximum number of agents to suffer linear regrets while incurring sublinear costs and manipulating only the observations of a few target agents.
arXiv Detail & Related papers (2023-11-03T04:03:19Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z) - Byzantine-Resilient Decentralized Multi-Armed Bandits [23.34196562182705]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values.<n>This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z) - Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space.
In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions.
Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to
Adversarial Corruptions [10.261123419337316]
We study the problem of bandits with adversarial corruptions in the cooperative multi-agent setting.
In the problem, the rewards are independently sampled from distributions across all agents and rounds, but they may be corrupted by an adversary.
Our goal is to minimize both the overall regret and communication cost across all agents.
arXiv Detail & Related papers (2021-06-08T09:32:43Z) - Robust Multi-Agent Multi-Armed Bandits [26.26185074977412]
Recent works have shown that agents facing independent instances of a $K$-armed bandit can collaborate to decrease regret.
We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior.
arXiv Detail & Related papers (2020-07-07T22:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.