Stochastic Alternating Direction Method of Multipliers for
Byzantine-Robust Distributed Learning
- URL: http://arxiv.org/abs/2106.06891v1
- Date: Sun, 13 Jun 2021 01:17:31 GMT
- Title: Stochastic Alternating Direction Method of Multipliers for
Byzantine-Robust Distributed Learning
- Authors: Feng Lin, Weiyu Li, Qing Ling
- Abstract summary: We propose a Byzantine-robust alternating direction method of multipliers (ADMM) that fully utilizes the separable problem structure.
Theoretically, we prove that the proposed method converges to a bounded neighborhood of the optimal solution at a rate of O(k) under mild assumptions.
Numerical experiments on the MNIST and COVERTYPE datasets demonstrate the effectiveness of the proposed method to various Byzantine attacks.
- Score: 22.835940007753376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to solve a distributed learning problem under Byzantine
attacks. In the underlying distributed system, a number of unknown but
malicious workers (termed as Byzantine workers) can send arbitrary messages to
the master and bias the learning process, due to data corruptions, computation
errors or malicious attacks. Prior work has considered a total variation (TV)
norm-penalized approximation formulation to handle the Byzantine attacks, where
the TV norm penalty forces the regular workers' local variables to be close,
and meanwhile, tolerates the outliers sent by the Byzantine workers. To solve
the TV norm-penalized approximation formulation, we propose a Byzantine-robust
stochastic alternating direction method of multipliers (ADMM) that fully
utilizes the separable problem structure. Theoretically, we prove that the
proposed method converges to a bounded neighborhood of the optimal solution at
a rate of O(1/k) under mild assumptions, where k is the number of iterations
and the size of neighborhood is determined by the number of Byzantine workers.
Numerical experiments on the MNIST and COVERTYPE datasets demonstrate the
effectiveness of the proposed method to various Byzantine attacks.
Related papers
- Byzantine Robustness and Partial Participation Can Be Achieved at Once: Just Clip Gradient Differences [61.74021364776313]
Distributed learning has emerged as a leading paradigm for training large machine learning models.
In real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models.
We propose the first distributed method with client sampling and provable tolerance to Byzantine workers.
arXiv Detail & Related papers (2023-11-23T17:50:30Z) - A Robust Classification Framework for Byzantine-Resilient Stochastic
Gradient Descent [3.5450828190071655]
This paper proposes a Robust Gradient Classification Framework (RGCF) for Byzantine fault tolerance in distributed gradient descent.
RGCF is not dependent on the number of workers; it can scale up to training instances with a large number of workers without a loss in performance.
arXiv Detail & Related papers (2023-01-16T10:40:09Z) - Byzantine-Robust Online and Offline Distributed Reinforcement Learning [60.970950468309056]
We consider a distributed reinforcement learning setting where multiple agents explore the environment and communicate their experiences through a central server.
$alpha$-fraction of agents are adversarial and can report arbitrary fake information.
We seek to identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents.
arXiv Detail & Related papers (2022-06-01T00:44:53Z) - Byzantine-Robust Decentralized Learning via ClippedGossip [61.03711813598128]
We propose a ClippedGossip algorithm for Byzantine-robust consensus optimization.
We demonstrate the encouraging empirical performance of ClippedGossip under a large number of attacks.
arXiv Detail & Related papers (2022-02-03T12:04:36Z) - Learning from History for Byzantine Robust Optimization [52.68913869776858]
Byzantine robustness has received significant attention recently given its importance for distributed learning.
We show that most existing robust aggregation rules may not converge even in the absence of any Byzantine attackers.
arXiv Detail & Related papers (2020-12-18T16:22:32Z) - Byzantine-Robust Variance-Reduced Federated Learning over Distributed
Non-i.i.d. Data [36.99547890386817]
We consider the federated learning problem where data on workers are not independent and identically distributed (i.i.d.)
An unknown number of Byzantine workers may send malicious messages to the central node, leading to remarkable learning error.
Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages.
arXiv Detail & Related papers (2020-09-17T09:09:23Z) - Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing [55.012801269326594]
In Byzantine robust distributed learning, a central server wants to train a machine learning model over data distributed across multiple workers.
A fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages.
We propose a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost.
arXiv Detail & Related papers (2020-06-16T17:58:53Z) - Byzantine-Robust Decentralized Stochastic Optimization over Static and
Time-Varying Networks [25.15075119957447]
We consider the Byzantine-robust optimization problem defined over decentralized static and time-varying networks.
Some of the agents are unreliable due to data corruptions, equipment failures or cyber-attacks.
Our key idea to handle the Byzantine attacks is to formulate a total variation (TV) norm-penalized approximation of the Byzantine-free problem.
We prove that the proposed method reaches a neighborhood of the Byzantine-free optimal solution, and the size of neighborhood is determined by the number of Byzantine agents and the network topology.
arXiv Detail & Related papers (2020-05-12T04:18:39Z) - Federated Variance-Reduced Stochastic Gradient Descent with Robustness
to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks.
To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules.
The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.