Byzantine Fault-Tolerant Distributed Machine Learning Using Stochastic
Gradient Descent (SGD) and Norm-Based Comparative Gradient Elimination (CGE)
- URL: http://arxiv.org/abs/2008.04699v2
- Date: Sun, 18 Apr 2021 00:56:13 GMT
- Title: Byzantine Fault-Tolerant Distributed Machine Learning Using Stochastic
Gradient Descent (SGD) and Norm-Based Comparative Gradient Elimination (CGE)
- Authors: Nirupam Gupta, Shuo Liu and Nitin H. Vaidya
- Abstract summary: We consider the Byzantine fault-tolerance problem in distributed gradient descent (D-SGD) method.
We propose a norm-based gradient-filter, named comparative gradient elimination (CGE)
We show that the CGE gradient-filter guarantees fault-tolerance against a bounded fraction of Byzantine agents under standard assumptions.
- Score: 2.582633087903328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers the Byzantine fault-tolerance problem in distributed
stochastic gradient descent (D-SGD) method - a popular algorithm for
distributed multi-agent machine learning. In this problem, each agent samples
data points independently from a certain data-generating distribution. In the
fault-free case, the D-SGD method allows all the agents to learn a mathematical
model best fitting the data collectively sampled by all agents. We consider the
case when a fraction of agents may be Byzantine faulty. Such faulty agents may
not follow a prescribed algorithm correctly, and may render traditional D-SGD
method ineffective by sharing arbitrary incorrect stochastic gradients. We
propose a norm-based gradient-filter, named comparative gradient elimination
(CGE), that robustifies the D-SGD method against Byzantine agents. We show that
the CGE gradient-filter guarantees fault-tolerance against a bounded fraction
of Byzantine agents under standard stochastic assumptions, and is
computationally simpler compared to many existing gradient-filters such as
multi-KRUM, geometric median-of-means, and the spectral filters. We empirically
show, by simulating distributed learning on neural networks, that the
fault-tolerance of CGE is comparable to that of existing gradient-filters. We
also empirically show that exponential averaging of stochastic gradients
improves the fault-tolerance of a generic gradient-filter.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Byzantine-Robust Decentralized Stochastic Optimization with Stochastic
Gradient Noise-Independent Learning Error [25.15075119957447]
We study Byzantine-robust optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by gradient descent (SGD)
The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process.
arXiv Detail & Related papers (2023-08-10T02:14:23Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Stochastic Gradient Variance Reduction by Solving a Filtering Problem [0.951828574518325]
Deep neural networks (DNN) are typically using optimized gradient descent (SGD)
The estimation of the gradient using samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence.
We propose textbfFilter Gradient Decent(FGD), an efficient optimization algorithm that makes the consistent estimation of gradient.
arXiv Detail & Related papers (2020-12-22T23:48:42Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data [10.965065178451104]
We study distributed gradient descent (SGD) in the master-worker architecture under Byzantine attacks.
Our algorithm can tolerate up to $frac14$ fraction Byzantine workers.
arXiv Detail & Related papers (2020-05-16T04:15:27Z) - Federated Variance-Reduced Stochastic Gradient Descent with Robustness
to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks.
To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules.
The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.