Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness
Constants
- URL: http://arxiv.org/abs/2402.10774v1
- Date: Fri, 16 Feb 2024 15:55:59 GMT
- Title: Error Feedback Reloaded: From Quadratic to Arithmetic Mean of Smoothness
Constants
- Authors: Peter Richt\'arik, Elnur Gasanov, Konstantin Burlachenko
- Abstract summary: We study a modern form of error feedback called EF21 (Richtarik et al., 2021)
In particular, while the theoretical communication complexity of EF21 depends on the quadratic mean of certain smoothness parameters, we improve this dependence to their arithmetic mean.
We continue to the discovery of a new weighted version of EF21 which can (fortunately) be executed without any cloning, and finally circle back to an improved analysis of the original EF21 method.
- Score: 4.2177789094825515
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Error Feedback (EF) is a highly popular and immensely effective mechanism for
fixing convergence issues which arise in distributed training methods (such as
distributed GD or SGD) when these are enhanced with greedy communication
compression techniques such as TopK. While EF was proposed almost a decade ago
(Seide et al., 2014), and despite concentrated effort by the community to
advance the theoretical understanding of this mechanism, there is still a lot
to explore. In this work we study a modern form of error feedback called EF21
(Richtarik et al., 2021) which offers the currently best-known theoretical
guarantees, under the weakest assumptions, and also works well in practice. In
particular, while the theoretical communication complexity of EF21 depends on
the quadratic mean of certain smoothness parameters, we improve this dependence
to their arithmetic mean, which is always smaller, and can be substantially
smaller, especially in heterogeneous data regimes. We take the reader on a
journey of our discovery process. Starting with the idea of applying EF21 to an
equivalent reformulation of the underlying problem which (unfortunately)
requires (often impractical) machine cloning, we continue to the discovery of a
new weighted version of EF21 which can (fortunately) be executed without any
cloning, and finally circle back to an improved analysis of the original EF21
method. While this development applies to the simplest form of EF21, our
approach naturally extends to more elaborate variants involving stochastic
gradients and partial participation. Further, our technique improves the
best-known theory of EF21 in the rare features regime (Richtarik et al., 2023).
Finally, we validate our theoretical findings with suitable experiments.
Related papers
- ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Momentum Provably Improves Error Feedback! [54.93799845077906]
When untreated, errors caused by compression propagate exponential training behavior.
EF21-SGDM improves the communication and sample complexities of previous error feedback algorithms.
arXiv Detail & Related papers (2023-05-24T13:52:02Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - Counterfactual Supervision-based Information Bottleneck for
Out-of-Distribution Generalization [40.94431121318241]
We show that the invariant risk minimization algorithm (IB-IRM) is not sufficient for learning invariant features in linear classification problems.
We propose a textitCounterfactual Supervision-based Information Bottleneck (CSIB) learning algorithm that provably recovers the invariant features.
arXiv Detail & Related papers (2022-08-16T15:26:00Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern
Error Feedback [11.899559337707112]
Existing theory of error feedback (EF) relies on very strong assumptions and provides pessimistic convergence rates.
Richtarik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a compressor induced by a contractive approximation.
In this paper we propose six practical extensions of EF21, all supported by strong convergence theory.
arXiv Detail & Related papers (2021-10-07T09:29:14Z) - EF21: A New, Simpler, Theoretically Better, and Practically Faster Error
Feedback [0.0]
Error feedback (EF) is an immensely popular stabilization mechanism in the context of distributed training of supervised machine learning.
We propose and analyze a new EF mechanism, which we call EF21, which consistently and substantially outperforms in practice.
In particular, we prove that EF21 enjoys a fast $O(1/T)$ convergence rate for smooth non convergence problems.
arXiv Detail & Related papers (2021-06-09T16:45:53Z) - A Better Alternative to Error Feedback for Communication-Efficient
Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions.
We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL)
No existing methods enjoy all the aforementioned properties.
We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.