Related papers: EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

URL: http://arxiv.org/abs/2106.05203v1
Date: Wed, 9 Jun 2021 16:45:53 GMT
Title: EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback
Authors: Peter Richt\'arik and Igor Sokolov and Ilyas Fatkhullin
Abstract summary: Error feedback (EF) is an immensely popular stabilization mechanism in the context of distributed training of supervised machine learning. We propose and analyze a new EF mechanism, which we call EF21, which consistently and substantially outperforms in practice. In particular, we prove that EF21 enjoys a fast $O(1/T)$ convergence rate for smooth non convergence problems.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-$k$. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost. In this work we fix all these deficiencies by proposing and analyzing a new EF mechanism, which we call EF21, which consistently and substantially outperforms EF in practice. Our theoretical analysis relies on standard assumptions only, works in the distributed heterogeneous data setting, and leads to better and more meaningful rates. In particular, we prove that EF21 enjoys a fast $O(1/T)$ convergence rate for smooth nonconvex problems, beating the previous bound of $O(1/T^{2/3})$, which was shown a bounded gradients assumption. We further improve this to a fast linear rate for PL functions, which is the first linear convergence result for an EF-type method not relying on unbiased compressors. Since EF has a large number of applications where it reigns supreme, we believe that our 2021 variant, EF21, can a large impact on the practice of communication efficient distributed learning.

Related papers

Tight analyses of first-order methods with error feedback [8.759583928626702]
Communication between agents often constitutes a major computational bottleneck in distributed learning.<n>One of the most common mitigation strategies is to compress the information exchanged.<n>Error feedback schemes were introduced to counteract the degradation in convergence associated with compressed communication.
arXiv Detail & Related papers (2025-06-05T17:30:18Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning. We show that the widely used beam search method suffers from unacceptable over-optimism. We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes [57.24311218570012]
We extend the non-smooth convex theory of EF21-P (Anonymous 2024) and MARINA-P (arXiv:2402.06412) in the non-size convex setting. We provide theoretical guarantees under constant, decreasing, and adaptive (aktypetype) steps.
arXiv Detail & Related papers (2024-12-22T16:18:34Z)
A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA) In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z)
Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness Constants [4.2177789094825515]
We study a modern form of error feedback called EF21 (Richtarik et al., 2021) In particular, while the theoretical communication complexity of EF21 depends on the quadratic mean of certain smoothness parameters, we improve this dependence to their arithmetic mean. We continue to the discovery of a new weighted version of EF21 which can (fortunately) be executed without any cloning, and finally circle back to an improved analysis of the original EF21 method.
arXiv Detail & Related papers (2024-02-16T15:55:59Z)
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL. We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training. For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z)
Momentum Provably Improves Error Feedback! [54.93799845077906]
When untreated, errors caused by compression propagate exponential training behavior. EF21-SGDM improves the communication and sample complexities of previous error feedback algorithms.
arXiv Detail & Related papers (2023-05-24T13:52:02Z)
Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression [37.6593006747285]
In learning server (FL) systems, the communication cost between the clients and the central bottleneck can be high. In this paper, we propose a technique to remedy the downsides of biased compression. Under partial participation, we develop an extra slow-down factor due to a so-called stale error accumulation'' effect.
arXiv Detail & Related papers (2022-11-25T18:49:53Z)
Optimizing Two-way Partial AUC with an End-to-end Framework [154.47590401735323]
Area Under the ROC Curve (AUC) is a crucial metric for machine learning. Recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics. We present the first trial in this paper to optimize this new metric.
arXiv Detail & Related papers (2022-06-23T12:21:30Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback [11.899559337707112]
Existing theory of error feedback (EF) relies on very strong assumptions and provides pessimistic convergence rates. Richtarik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a compressor induced by a contractive approximation. In this paper we propose six practical extensions of EF21, all supported by strong convergence theory.
arXiv Detail & Related papers (2021-10-07T09:29:14Z)
A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions. We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z)
Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning. We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems. We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.