Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker
Assumptions and Communication Compression as a Cherry on the Top
- URL: http://arxiv.org/abs/2206.00529v1
- Date: Wed, 1 Jun 2022 14:40:29 GMT
- Title: Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker
Assumptions and Communication Compression as a Cherry on the Top
- Authors: Eduard Gorbunov, Samuel Horv\'ath, Peter Richt\'arik, Gauthier Gidel
- Abstract summary: Byzantine-robustness has gained a lot of attention due to the interest in collaborative andtolerant learning.
Byz-VRMARINA is a new Byzantine approach to robustness and communication compression.
Unlike Byzantine-robust methods with gradients, our results are tight and do not rely on restrictive assumptions such as boundedness or limited compression.
- Score: 10.579228752210168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Byzantine-robustness has been gaining a lot of attention due to the growth of
the interest in collaborative and federated learning. However, many fruitful
directions, such as the usage of variance reduction for achieving robustness
and communication compression for reducing communication costs, remain weakly
explored in the field. This work addresses this gap and proposes Byz-VR-MARINA
- a new Byzantine-tolerant method with variance reduction and compression. A
key message of our paper is that variance reduction is key to fighting
Byzantine workers more effectively. At the same time, communication compression
is a bonus that makes the process more communication efficient. We derive
theoretical convergence guarantees for Byz-VR-MARINA outperforming previous
state-of-the-art for general non-convex and Polyak-Lojasiewicz loss functions.
Unlike the concurrent Byzantine-robust methods with variance reduction and/or
compression, our complexity results are tight and do not rely on restrictive
assumptions such as boundedness of the gradients or limited compression.
Moreover, we provide the first analysis of a Byzantine-tolerant method
supporting non-uniform sampling of stochastic gradients. Numerical experiments
corroborate our theoretical findings.
Related papers
- Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos.
It faces critical challenges related to robustness and communication preservation.
We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z) - Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - TernaryVote: Differentially Private, Communication Efficient, and
Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously.
We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z) - Byzantine Robustness and Partial Participation Can Be Achieved at Once: Just Clip Gradient Differences [61.74021364776313]
Distributed learning has emerged as a leading paradigm for training large machine learning models.
In real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models.
We propose the first distributed method with client sampling and provable tolerance to Byzantine workers.
arXiv Detail & Related papers (2023-11-23T17:50:30Z) - Communication Compression for Byzantine Robust Learning: New Efficient
Algorithms and Improved Rates [9.965047642855074]
Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems.
We propose a new Byzantine-robust compression method with better convergence rate.
We also develop the first Byzantine-robust method with communication compression error feedback.
arXiv Detail & Related papers (2023-10-15T11:22:34Z) - Expressive Losses for Verified Robustness via Convex Combinations [67.54357965665676]
We study the relationship between the over-approximation coefficient and performance profiles across different expressive losses.
We show that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
arXiv Detail & Related papers (2023-05-23T12:20:29Z) - Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient [0.0]
We propose a new method -- Byzantine-Robust Loopless Variance Reduced Gradient (BR-LSVRG)
We derive non-asymptotic convergence guarantees for the new method in the strongly convex case.
arXiv Detail & Related papers (2023-03-08T13:20:49Z) - BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify
Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system.
In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious.
In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Federated Variance-Reduced Stochastic Gradient Descent with Robustness
to Byzantine Attacks [74.36161581953658]
This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks.
To cope with such attacks, most resilient approaches so far combine gradient descent (SGD) with different robust aggregation rules.
The present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks.
arXiv Detail & Related papers (2019-12-29T19:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.