ByGARS: Byzantine SGD with Arbitrary Number of Attackers
- URL: http://arxiv.org/abs/2006.13421v2
- Date: Mon, 7 Dec 2020 20:59:12 GMT
- Title: ByGARS: Byzantine SGD with Arbitrary Number of Attackers
- Authors: Jayanth Regatti, Hao Chen and Abhishek Gupta
- Abstract summary: We propose two novel computation algorithms for distributed machine learning in the presence of Byzantine adversaries.
In these algorithms, reputation scores workers are computed using an auxiliary dataset at the server.
We show that the proposed algorithms are robust to multiple different types of attacks at the same time.
- Score: 8.987350240289757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose two novel stochastic gradient descent algorithms, ByGARS and
ByGARS++, for distributed machine learning in the presence of any number of
Byzantine adversaries. In these algorithms, reputation scores of workers are
computed using an auxiliary dataset at the server. This reputation score is
then used for aggregating the gradients for stochastic gradient descent. The
computational complexity of ByGARS++ is the same as the usual distributed
stochastic gradient descent method with only an additional inner product
computation in every iteration. We show that using these reputation scores for
gradient aggregation is robust to any number of multiplicative noise Byzantine
adversaries and use two-timescale stochastic approximation theory to prove
convergence for strongly convex loss functions. We demonstrate the
effectiveness of the algorithms for non-convex learning problems using MNIST
and CIFAR-10 datasets against almost all state-of-the-art Byzantine attacks. We
also show that the proposed algorithms are robust to multiple different types
of attacks at the same time.
Related papers
- A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms [45.90015262911875]
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting.
As a by-product of our analysis, we also demonstrate guarantees for gradient-type algorithms such as SGD with random tightness.
arXiv Detail & Related papers (2023-10-31T13:44:53Z) - Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary
Search [6.461473289206789]
We show that gradient-free ML algorithms can be combined with classical distributed consensus algorithms to generate gradient-free byzantine-resilient distributed learning algorithms.
We provide proofs and pseudo-code for two specific cases - the Total Order Broadcast and proof-of-work leader election.
arXiv Detail & Related papers (2023-04-20T17:13:29Z) - Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences.
Exact methods yield better classification performance, but they pose high computational costs.
We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z) - A Unified Framework for Implicit Sinkhorn Differentiation [58.56866763433335]
We propose an algorithm that obtains analytical gradients of a Sinkhorn layer via implicit differentiation.
We show that it is computationally more efficient, particularly when resources like GPU memory are scarce.
arXiv Detail & Related papers (2022-05-13T14:45:31Z) - Escaping Saddle Points in Distributed Newton's Method with Communication
efficiency and Byzantine Resilience [49.379254729853756]
We consider the problem of optimizing a non-regularized loss function (with saddle points) in a distributed framework in the presence of Byzantine machines.
We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently.
We obtain theoretical guarantees for our proposed scheme under several approximate settings including (sub-sampled) and Hessians.
arXiv Detail & Related papers (2021-03-17T03:53:58Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.