Byzantine-Robust Online and Offline Distributed Reinforcement Learning
- URL: http://arxiv.org/abs/2206.00165v1
- Date: Wed, 1 Jun 2022 00:44:53 GMT
- Title: Byzantine-Robust Online and Offline Distributed Reinforcement Learning
- Authors: Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu
- Abstract summary: We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
- Score: 60.970950468309056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a distributed reinforcement learning setting where multiple
agents separately explore the environment and communicate their experiences
through a central server. However, $\alpha$-fraction of agents are adversarial
and can report arbitrary fake information. Critically, these adversarial agents
can collude and their fake data can be of any sizes. We desire to robustly
identify a near-optimal policy for the underlying Markov decision process in
the presence of these adversarial agents. Our main technical contribution is
Weighted-Clique, a novel algorithm for the robust mean estimation from batches
problem, that can handle arbitrary batch sizes. Building upon this new
estimator, in the offline setting, we design a Byzantine-robust distributed
pessimistic value iteration algorithm; in the online setting, we design a
Byzantine-robust distributed optimistic value iteration algorithm. Both
algorithms obtain near-optimal sample complexities and achieve superior
robustness guarantee than prior works.
Related papers
- A Federated Distributionally Robust Support Vector Machine with Mixture of Wasserstein Balls Ambiguity Set for Distributed Fault Diagnosis [3.662364375995991]
We study the problem of training a distributionally robust (DR) support vector machine (SVM) in a federated fashion over a network comprised of a central server and $G$ clients without sharing data.
We propose two distributed optimization algorithms for training the global FDR-SVM.
arXiv Detail & Related papers (2024-10-04T19:21:45Z) - Sequential Manipulation Against Rank Aggregation: Theory and Algorithm [119.57122943187086]
We leverage an online attack on the vulnerable data collection process.
From the game-theoretic perspective, the confrontation scenario is formulated as a distributionally robust game.
The proposed method manipulates the results of rank aggregation methods in a sequential manner.
arXiv Detail & Related papers (2024-07-02T03:31:21Z) - Is Offline Decision Making Possible with Only Few Samples? Reliable
Decisions in Data-Starved Bandits via Trust Region Enhancement [25.68354404229254]
We show that even in a data-starved setting it may still be possible to find a policy competitive with the optimal one.
This paves the way to reliable decision-making in settings where critical decisions must be made by relying only on a handful of samples.
arXiv Detail & Related papers (2024-02-24T03:41:09Z) - Scalable Decentralized Algorithms for Online Personalized Mean Estimation [12.002609934938224]
This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean.
We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach.
arXiv Detail & Related papers (2024-02-20T08:30:46Z) - Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts.
We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Byzantine-Robust Decentralized Stochastic Optimization over Static and
Time-Varying Networks [25.15075119957447]
We consider the Byzantine-robust optimization problem defined over decentralized static and time-varying networks.
Some of the agents are unreliable due to data corruptions, equipment failures or cyber-attacks.
Our key idea to handle the Byzantine attacks is to formulate a total variation (TV) norm-penalized approximation of the Byzantine-free problem.
We prove that the proposed method reaches a neighborhood of the Byzantine-free optimal solution, and the size of neighborhood is determined by the number of Byzantine agents and the network topology.
arXiv Detail & Related papers (2020-05-12T04:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.