Related papers: Robust Multi-Agent Multi-Armed Bandits

Robust Multi-Agent Multi-Armed Bandits

URL: http://arxiv.org/abs/2007.03812v3
Date: Sun, 10 Oct 2021 14:25:23 GMT
Title: Robust Multi-Agent Multi-Armed Bandits
Authors: Daniel Vial, Sanjay Shakkottai, R. Srikant
Abstract summary: Recent works have shown that agents facing independent instances of a $K$-armed bandit can collaborate to decrease regret. We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior.
Score: 26.26185074977412
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include $n$ honest and $m$ malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.

Related papers

Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions [6.234292942334148]
We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruptions. As a side-product, our algorithm also improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios.
arXiv Detail & Related papers (2024-11-12T20:20:26Z)
On the Resilience of Multi-Agent Systems with Malicious Agents [58.79302663733702]
This paper investigates what is the resilience of multi-agent system structures under malicious agents. We devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one. We show that two defense methods, introducing a mechanism for each agent to challenge others' outputs, or an additional agent to review and correct messages, can enhance system resilience.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)
Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit. In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Byzantine-Resilient Decentralized Multi-Armed Bandits [25.499420566469098]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets. Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z)
A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z)
Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds [31.5504566292927]
We consider a linear bandit problem involving $M$ agents that can collaborate via a central server to minimize regret. A fraction $alpha$ of these agents are adversarial and can act arbitrarily, leading to the following tension. We design new algorithms that balance the exploration-exploitation trade-off via carefully constructed robust confidence intervals.
arXiv Detail & Related papers (2022-06-06T18:16:34Z)
Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server. $alpha$-fraction of agents are adversarial and can report arbitrary fake information. We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z)
Distributed Bandits with Heterogeneous Agents [38.90376765616447]
This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed bandit problem. We propose two learning algorithms, ucbo and AAE. We prove that both algorithms achieve order-optimal regret, which is $Oleft(sum_i:tildeDelta_i>0 log T/tildeDelta_iright)$, where $tildeDelta_i$ is the minimum suboptimality gap between the reward mean of
arXiv Detail & Related papers (2022-01-23T20:04:15Z)
Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
Adversarial Attacks on Linear Contextual Bandits [87.08004581867537]
Malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior. We show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times over a horizon of $T$ steps. We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context.
arXiv Detail & Related papers (2020-02-10T15:04:09Z)
The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits [20.259428328004738]
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph.
arXiv Detail & Related papers (2020-01-15T17:49:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.