A Better Alternative to Error Feedback for Communication-Efficient
Distributed Learning
- URL: http://arxiv.org/abs/2006.11077v2
- Date: Sun, 14 Mar 2021 10:27:02 GMT
- Title: A Better Alternative to Error Feedback for Communication-Efficient
Distributed Learning
- Authors: Samuel Horv\'ath and Peter Richt\'arik
- Abstract summary: We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions.
We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern large-scale machine learning applications require stochastic
optimization algorithms to be implemented on distributed compute systems. A key
bottleneck of such systems is the communication overhead for exchanging
information across the workers, such as stochastic gradients. Among the many
techniques proposed to remedy this issue, one of the most successful is the
framework of compressed communication with error feedback (EF). EF remains the
only known technique that can deal with the error induced by contractive
compressors which are not unbiased, such as Top-$K$. In this paper, we propose
a new and theoretically and practically better alternative to EF for dealing
with contractive compressors. In particular, we propose a construction which
can transform any contractive compressor into an induced unbiased compressor.
Following this transformation, existing methods able to work with unbiased
compressors can be applied. We show that our approach leads to vast
improvements over EF, including reduced memory requirements, better
communication complexity guarantees and fewer assumptions. We further extend
our results to federated learning with partial participation following an
arbitrary distribution over the nodes, and demonstrate the benefits thereof. We
perform several numerical experiments which validate our theoretical findings.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors.
We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers.
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z) - EControl: Fast Distributed Optimization with Compression and Error
Control [8.624830915051021]
We propose EControl, a novel mechanism that can regulate the strength of the feedback signal.
We show that EControl mitigates the naive implementation of our method and support our findings.
arXiv Detail & Related papers (2023-11-06T10:00:13Z) - EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck.
There are two classes of compression operators and separate algorithms making use of them.
We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z) - 3PC: Three Point Compressors for Communication-Efficient Distributed
Training and a Better Theory for Lazy Aggregation [12.013162443721312]
We propose a new class of gradient communication mechanisms for communication-efficient training.
We show that our approach can recover the recently proposed state-of-the-art error feedback mechanism EF21.
We provide a new fundamental link between the lazy aggregation and error feedback literature.
arXiv Detail & Related papers (2022-02-02T12:34:18Z) - Communication-Compressed Adaptive Gradient Method for Distributed
Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers.
Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - Innovation Compression for Communication-efficient Distributed
Optimization with Linear Convergence [23.849813231750932]
This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems.
By compressing innovation vectors, COLD is able to achieve linear convergence for a class of $delta$-contracted compressors.
Numerical experiments demonstrate the advantages of both algorithms under different compressors.
arXiv Detail & Related papers (2021-05-14T08:15:18Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings.
We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.