Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to
Adversarial Corruptions
- URL: http://arxiv.org/abs/2106.04207v1
- Date: Tue, 8 Jun 2021 09:32:43 GMT
- Title: Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to
Adversarial Corruptions
- Authors: Junyan Liu, Shuai Li, Dapeng Li
- Abstract summary: We study the problem of bandits with adversarial corruptions in the cooperative multi-agent setting.
In the problem, the rewards are independently sampled from distributions across all agents and rounds, but they may be corrupted by an adversary.
Our goal is to minimize both the overall regret and communication cost across all agents.
- Score: 10.261123419337316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of stochastic bandits with adversarial corruptions in
the cooperative multi-agent setting, where $V$ agents interact with a common
$K$-armed bandit problem, and each pair of agents can communicate with each
other to expedite the learning process. In the problem, the rewards are
independently sampled from distributions across all agents and rounds, but they
may be corrupted by an adversary. Our goal is to minimize both the overall
regret and communication cost across all agents. We first show that an additive
term of corruption is unavoidable for any algorithm in this problem. Then, we
propose a new algorithm that is agnostic to the level of corruption. Our
algorithm not only achieves near-optimal regret in the stochastic setting, but
also obtains a regret with an additive term of corruption in the corrupted
setting, while maintaining efficient communication. The algorithm is also
applicable for the single-agent corruption problem, and achieves a high
probability regret that removes the multiplicative dependence of $K$ on
corruption level. Our result of the single-agent case resolves an open question
from Gupta et al. [2019].
Related papers
- Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions [6.234292942334148]
We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruptions.
As a side-product, our algorithm also improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios.
arXiv Detail & Related papers (2024-11-12T20:20:26Z) - Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space.
In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions.
Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z) - A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - On Optimal Robustness to Adversarial Corruption in Online Decision
Problems [27.68461396741871]
We show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.
For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
arXiv Detail & Related papers (2021-09-22T18:26:45Z) - Improved Corruption Robust Algorithms for Episodic Reinforcement
Learning [43.279169081740726]
We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system.
We propose new algorithms which, compared to the existing results, achieve strictly better regret bounds in terms of total corruptions.
Our results follow from a general algorithmic framework that combines corruption-robust policy elimination meta-algorithms, and plug-in reward-free exploration sub-algorithms.
arXiv Detail & Related papers (2021-02-13T07:04:23Z) - Robust Multi-Agent Multi-Armed Bandits [26.26185074977412]
Recent works have shown that agents facing independent instances of a $K$-armed bandit can collaborate to decrease regret.
We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior.
arXiv Detail & Related papers (2020-07-07T22:27:30Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.