EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern
Error Feedback
- URL: http://arxiv.org/abs/2110.03294v1
- Date: Thu, 7 Oct 2021 09:29:14 GMT
- Title: EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern
Error Feedback
- Authors: Ilyas Fatkhullin and Igor Sokolov and Eduard Gorbunov and Zhize Li and
Peter Richt\'arik
- Abstract summary: Existing theory of error feedback (EF) relies on very strong assumptions and provides pessimistic convergence rates.
Richtarik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a compressor induced by a contractive approximation.
In this paper we propose six practical extensions of EF21, all supported by strong convergence theory.
- Score: 11.899559337707112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very
popular mechanism for enforcing convergence of distributed gradient-based
optimization methods enhanced with communication compression strategies based
on the application of contractive compression operators. However, existing
theory of EF relies on very strong assumptions (e.g., bounded gradients), and
provides pessimistic convergence rates (e.g., while the best known rate for EF
in the smooth nonconvex regime, and when full gradients are compressed, is
$O(1/T^{2/3})$, the rate of gradient descent in the same regime is $O(1/T)$).
Recently, Richt\'{a}rik et al. (2021) proposed a new error feedback mechanism,
EF21, based on the construction of a Markov compressor induced by a contractive
compressor. EF21 removes the aforementioned theoretical deficiencies of EF and
at the same time works better in practice. In this work we propose six
practical extensions of EF21, all supported by strong convergence theory:
partial participation, stochastic approximation, variance reduction, proximal
setting, momentum and bidirectional compression. Several of these techniques
were never analyzed in conjunction with EF before, and in cases where they were
(e.g., bidirectional compression), our rates are vastly superior.
Related papers
- MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes [57.24311218570012]
We extend the non-smooth convex theory of EF21-P (Anonymous 2024) and MARINA-P (arXiv:2402.06412) in the non-size convex setting.
We provide theoretical guarantees under constant, decreasing, and adaptive (aktypetype) steps.
arXiv Detail & Related papers (2024-12-22T16:18:34Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness
Constants [4.2177789094825515]
We study a modern form of error feedback called EF21 (Richtarik et al., 2021)
In particular, while the theoretical communication complexity of EF21 depends on the quadratic mean of certain smoothness parameters, we improve this dependence to their arithmetic mean.
We continue to the discovery of a new weighted version of EF21 which can (fortunately) be executed without any cloning, and finally circle back to an improved analysis of the original EF21 method.
arXiv Detail & Related papers (2024-02-16T15:55:59Z) - Momentum Provably Improves Error Feedback! [54.93799845077906]
When untreated, errors caused by compression propagate exponential training behavior.
EF21-SGDM improves the communication and sample complexities of previous error feedback algorithms.
arXiv Detail & Related papers (2023-05-24T13:52:02Z) - EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck.
There are two classes of compression operators and separate algorithms making use of them.
We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - EF21: A New, Simpler, Theoretically Better, and Practically Faster Error
Feedback [0.0]
Error feedback (EF) is an immensely popular stabilization mechanism in the context of distributed training of supervised machine learning.
We propose and analyze a new EF mechanism, which we call EF21, which consistently and substantially outperforms in practice.
In particular, we prove that EF21 enjoys a fast $O(1/T)$ convergence rate for smooth non convergence problems.
arXiv Detail & Related papers (2021-06-09T16:45:53Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z) - A Better Alternative to Error Feedback for Communication-Efficient
Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions.
We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z) - On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings.
We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.