EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization
- URL: http://arxiv.org/abs/2205.04180v1
- Date: Mon, 9 May 2022 10:44:23 GMT
- Title: EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization
- Authors: Laurent Condat, Kai Yi, Peter Richt\'arik
- Abstract summary: In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck.
There are two classes of compression operators and separate algorithms making use of them.
We propose a new algorithm, recovering DIANA and EF21 as particular cases.
- Score: 7.691755449724637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In distributed or federated optimization and learning, communication between
the different computing units is often the bottleneck, and gradient compression
is a widely used technique for reducing the number of bits sent within each
communication round of iterative methods. There are two classes of compression
operators and separate algorithms making use of them. In the case of unbiased
random compressors with bounded variance (e.g., rand-k), the DIANA algorithm of
Mishchenko et al. [2019], which implements a variance reduction technique for
handling the variance introduced by compression, is the current state of the
art. In the case of biased and contractive compressors (e.g., top-k), the EF21
algorithm of Richt\'arik et al. [2021], which implements an error-feedback
mechanism for handling the error introduced by compression, is the current
state of the art. These two classes of compression schemes and algorithms are
distinct, with different analyses and proof techniques. In this paper, we unify
them into a single framework and propose a new algorithm, recovering DIANA and
EF21 as particular cases. We prove linear convergence under certain conditions.
Our general approach works with a new, larger class of compressors, which
includes unbiased and biased compressors as particular cases, and has two
parameters, the bias and the variance. These gives a finer control and allows
us to inherit the best of the two worlds: biased compressors, whose good
performance in practice is recognized, can be used. And independent randomness
at the compressors allows to mitigate the effects of compression, with the
convergence rate improving when the number of parallel workers is large. This
is the first time that an algorithm with all these features is proposed. Our
approach takes a step towards better understanding of two so-far distinct
worlds of communication-efficient distributed learning.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Lower Bounds and Accelerated Algorithms in Distributed Stochastic
Optimization with Communication Compression [31.107056382542417]
Communication compression is an essential strategy for alleviating communication overhead.
We propose NEOLITHIC, a nearly optimal algorithm for compression under mild conditions.
arXiv Detail & Related papers (2023-05-12T17:02:43Z) - Shifted Compression Framework: Generalizations and Improvements [2.2147691173934967]
Communication is one of the key bottlenecks in the distributed training of large-scale machine learning models.
Lossy compression of exchanged information, such as gradients or models, is one of the most effective instruments to alleviate this issue.
arXiv Detail & Related papers (2022-06-21T15:00:04Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Innovation Compression for Communication-efficient Distributed
Optimization with Linear Convergence [23.849813231750932]
This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems.
By compressing innovation vectors, COLD is able to achieve linear convergence for a class of $delta$-contracted compressors.
Numerical experiments demonstrate the advantages of both algorithms under different compressors.
arXiv Detail & Related papers (2021-05-14T08:15:18Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z) - A Better Alternative to Error Feedback for Communication-Efficient
Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions.
We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z) - On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings.
We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.