Related papers: EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback

EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback

URL: http://arxiv.org/abs/2110.03294v2
Date: Fri, 20 Jun 2025 17:11:24 GMT
Title: EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback
Authors: Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik,
Abstract summary: We propose a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor.<n>Several of these techniques have not been previously analyzed in combination with EF21.
Score: 65.60461294185853
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is $O(1/T^{2/3})$, the rate of gradient descent in the same regime is $O(1/T)$). Recently, Richt\'arik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum, and bidirectional compression. To the best of our knowledge, several of these techniques have not been previously analyzed in combination with EF, and in cases where prior analysis exists -- such as for bidirectional compression -- our theoretical convergence guarantees significantly improve upon existing results.

Related papers

Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex Optimization [3.6093008648437013]
Gradient compression is an effective technique for reducing communication costs in federated learning.<n>In this paper, we analyze the convergence behaviors of FL under different frameworks.<n>We prove that EF can effectively reduce the variance of gradient estimation despite asynchronous delay.
arXiv Detail & Related papers (2025-04-28T15:35:34Z)
MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes [57.24311218570012]
We extend the non-smooth convex theory of EF21-P (Anonymous 2024) and MARINA-P (arXiv:2402.06412) in the non-size convex setting. We provide theoretical guarantees under constant, decreasing, and adaptive (aktypetype) steps.
arXiv Detail & Related papers (2024-12-22T16:18:34Z)
Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Communication-Efficient Distributed Learning with Local Immediate Error Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm. LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z)
Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness Constants [4.2177789094825515]
We study a modern form of error feedback called EF21 (Richtarik et al., 2021) In particular, while the theoretical communication complexity of EF21 depends on the quadratic mean of certain smoothness parameters, we improve this dependence to their arithmetic mean. We continue to the discovery of a new weighted version of EF21 which can (fortunately) be executed without any cloning, and finally circle back to an improved analysis of the original EF21 method.
arXiv Detail & Related papers (2024-02-16T15:55:59Z)
Momentum Provably Improves Error Feedback! [54.93799845077906]
When untreated, errors caused by compression propagate exponential training behavior. EF21-SGDM improves the communication and sample complexities of previous error feedback algorithms.
arXiv Detail & Related papers (2023-05-24T13:52:02Z)
EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck. There are two classes of compression operators and separate algorithms making use of them. We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback [0.0]
Error feedback (EF) is an immensely popular stabilization mechanism in the context of distributed training of supervised machine learning. We propose and analyze a new EF mechanism, which we call EF21, which consistently and substantially outperforms in practice. In particular, we prove that EF21 enjoys a fast $O(1/T)$ convergence rate for smooth non convergence problems.
arXiv Detail & Related papers (2021-06-09T16:45:53Z)
Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent. underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions. We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z)
On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.