Related papers: Byzantine-Resilient Decentralized Multi-Armed Bandits

Byzantine-Resilient Decentralized Multi-Armed Bandits

URL: http://arxiv.org/abs/2310.07320v1
Date: Wed, 11 Oct 2023 09:09:50 GMT
Title: Byzantine-Resilient Decentralized Multi-Armed Bandits
Authors: Jingxuan Zhu, Alec Koppel, Alvaro Velasquez, Ji Liu
Abstract summary: We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
Score: 25.499420566469098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In decentralized cooperative multi-armed bandits (MAB), each agent observes a distinct stream of rewards, and seeks to exchange information with others to select a sequence of arms so as to minimize its regret. Agents in the cooperative setting can outperform a single agent running a MAB method such as Upper-Confidence Bound (UCB) independently. In this work, we study how to recover such salient behavior when an unknown fraction of the agents can be Byzantine, that is, communicate arbitrarily wrong information in the form of reward mean-estimates or confidence sets. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets. Our key contribution is the development of a fully decentralized resilient upper confidence bound (UCB) algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values. This truncation step enables us to establish that the performance of each normal agent is no worse than the classic single-agent UCB1 algorithm in terms of regret, and more importantly, the cumulative regret of all normal agents is strictly better than the non-cooperative case, provided that each agent has at least 3f+1 neighbors where f is the maximum possible Byzantine agents in each agent's neighborhood. Extensions to time-varying neighbor graphs, and minimax lower bounds are further established on the achievable regret. Experiments corroborate the merits of this framework in practice.

Related papers

Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions [6.234292942334148]
We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruptions. As a side-product, our algorithm also improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios.
arXiv Detail & Related papers (2024-11-12T20:20:26Z)
Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI. Agent's goal is to identify the best arm with a prescribed confidence level. Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z)
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees. We study this question in a general framework for interactive decision making with multiple agents. We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z)
Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server. $alpha$-fraction of agents are adversarial and can report arbitrary fake information. We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z)
Private and Byzantine-Proof Cooperative Decision-Making [15.609414012418043]
The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit. In this paper, we investigate the bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine. We provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) private. Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems.
arXiv Detail & Related papers (2022-05-27T18:03:54Z)
Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
Decentralized Upper Confidence Bound Algorithms for Homogeneous Multi-Agent Multi-Armed Bandits [16.038995442397972]
The problem is simultaneously solved by $N$ agents assuming they face a common set of $M$ arms and share the same arms' reward distributions. Two fully decentralized upper confidence bound (UCB) algorithms are proposed for undirected graphs. The proposed UCB1 and KL-UCB algorithms permit each agent in the network to achieve a better logarithmic regret than their single-agent counterparts.
arXiv Detail & Related papers (2021-11-22T01:05:52Z)
Resilient Consensus-based Multi-agent Reinforcement Learning [22.774403531759592]
We consider a fully decentralized network, where each agent receives a local reward and observes global state and action. We show that in the presence of Byzantine agents, whose estimation and communication strategies are completely arbitrary, the estimates of the cooperative agents converge to a bounded consensus value with probability one. We prove that the policy of the cooperative agents converges with probability one to a bounded neighborhood around a local maximizer of their team-average objective function.
arXiv Detail & Related papers (2021-11-12T15:38:01Z)
Robust Multi-Agent Multi-Armed Bandits [26.26185074977412]
Recent works have shown that agents facing independent instances of a $K$-armed bandit can collaborate to decrease regret. We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior.
arXiv Detail & Related papers (2020-07-07T22:27:30Z)
The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits [20.259428328004738]
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph.
arXiv Detail & Related papers (2020-01-15T17:49:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.