Communication Compression for Byzantine Robust Learning: New Efficient
Algorithms and Improved Rates
- URL: http://arxiv.org/abs/2310.09804v2
- Date: Sat, 9 Mar 2024 13:09:30 GMT
- Title: Communication Compression for Byzantine Robust Learning: New Efficient
Algorithms and Improved Rates
- Authors: Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter
Richt\'arik
- Abstract summary: Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems.
We propose a new Byzantine-robust compression method with better convergence rate.
We also develop the first Byzantine-robust method with communication compression error feedback.
- Score: 9.965047642855074
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Byzantine robustness is an essential feature of algorithms for certain
distributed optimization problems, typically encountered in
collaborative/federated learning. These problems are usually huge-scale,
implying that communication compression is also imperative for their
resolution. These factors have spurred recent algorithmic and theoretical
developments in the literature of Byzantine-robust learning with compression.
In this paper, we contribute to this research area in two main directions.
First, we propose a new Byzantine-robust method with compression -
Byz-DASHA-PAGE - and prove that the new method has better convergence rate (for
non-convex and Polyak-Lojasiewicz smooth optimization problems), smaller
neighborhood size in the heterogeneous case, and tolerates more Byzantine
workers under over-parametrization than the previous method with SOTA
theoretical convergence guarantees (Byz-VR-MARINA). Secondly, we develop the
first Byzantine-robust method with communication compression and error feedback
- Byz-EF21 - along with its bidirectional compression version - Byz-EF21-BC -
and derive the convergence rates for these methods for non-convex and
Polyak-Lojasiewicz smooth case. We test the proposed methods and illustrate our
theoretical findings in the numerical experiments.
Related papers
- Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering [17.446431849022346]
Distributed learning has become the standard approach for training large-scale machine learning models across private data silos.
It faces critical challenges related to robustness and communication preservation.
We propose a novel Byzantine-robust and communication-efficient distributed learning method.
arXiv Detail & Related papers (2024-09-13T08:53:10Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Momentum Provably Improves Error Feedback! [54.93799845077906]
When untreated, errors caused by compression propagate exponential training behavior.
EF21-SGDM improves the communication and sample complexities of previous error feedback algorithms.
arXiv Detail & Related papers (2023-05-24T13:52:02Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker
Assumptions and Communication Compression as a Cherry on the Top [10.579228752210168]
Byzantine-robustness has gained a lot of attention due to the interest in collaborative andtolerant learning.
Byz-VRMARINA is a new Byzantine approach to robustness and communication compression.
Unlike Byzantine-robust methods with gradients, our results are tight and do not rely on restrictive assumptions such as boundedness or limited compression.
arXiv Detail & Related papers (2022-06-01T14:40:29Z) - EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck.
There are two classes of compression operators and separate algorithms making use of them.
We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z) - Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient
Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP)
In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods.
We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z) - Distributed Methods with Compressed Communication for Solving
Variational Inequalities, with Theoretical Guarantees [115.08148491584997]
We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2.
New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
arXiv Detail & Related papers (2021-10-07T10:04:32Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.