3PC: Three Point Compressors for Communication-Efficient Distributed
Training and a Better Theory for Lazy Aggregation
- URL: http://arxiv.org/abs/2202.00998v1
- Date: Wed, 2 Feb 2022 12:34:18 GMT
- Title: 3PC: Three Point Compressors for Communication-Efficient Distributed
Training and a Better Theory for Lazy Aggregation
- Authors: Peter Richt\'arik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov,
Zhize Li, Eduard Gorbunov
- Abstract summary: We propose a new class of gradient communication mechanisms for communication-efficient training.
We show that our approach can recover the recently proposed state-of-the-art error feedback mechanism EF21.
We provide a new fundamental link between the lazy aggregation and error feedback literature.
- Score: 12.013162443721312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose and study a new class of gradient communication mechanisms for
communication-efficient training -- three point compressors (3PC) -- as well as
efficient distributed nonconvex optimization algorithms that can take advantage
of them. Unlike most established approaches, which rely on a static compressor
choice (e.g., Top-$K$), our class allows the compressors to {\em evolve}
throughout the training process, with the aim of improving the theoretical
communication complexity and practical efficiency of the underlying methods. We
show that our general approach can recover the recently proposed
state-of-the-art error feedback mechanism EF21 (Richt\'arik et al., 2021) and
its theoretical properties as a special case, but also leads to a number of new
efficient methods. Notably, our approach allows us to improve upon the state of
the art in the algorithmic and theoretical foundations of the {\em lazy
aggregation} literature (Chen et al., 2018). As a by-product that may be of
independent interest, we provide a new and fundamental link between the lazy
aggregation and error feedback literature. A special feature of our work is
that we do not require the compressors to be unbiased.
Related papers
- Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors.
We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers.
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z) - Adaptive Compression for Communication-Efficient Distributed Training [3.1148846501645084]
We propose a novel algorithm for communication-training of supervised machine learning models with adaptive compression level.
Our approach is inspired by the recently proposed three point compressor (3PC) of Richtarik et al.
We extend the 3PC framework to bidirectional compression, i.e., we allow the server to compress as well.
arXiv Detail & Related papers (2022-10-31T23:09:01Z) - Distributed Newton-Type Methods with Communication Compression and
Bernoulli Aggregation [11.870393751095083]
We study ommunication compression and aggregation mechanisms for curvature information.
New 3PC mechanisms, such as adaptive thresholding and Bernoulli aggregation, require reduced communication and occasional Hessian computations.
For all our methods, we derive fast condition-number-independent local linear and/or superlinear convergence rates.
arXiv Detail & Related papers (2022-06-07T21:12:21Z) - EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck.
There are two classes of compression operators and separate algorithms making use of them.
We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z) - Distributed Methods with Compressed Communication for Solving
Variational Inequalities, with Theoretical Guarantees [115.08148491584997]
We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2.
New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
arXiv Detail & Related papers (2021-10-07T10:04:32Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - Remote Multilinear Compressive Learning with Adaptive Compression [107.87219371697063]
MultiIoT Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals.
We propose a novel optimization scheme that enables such a feature for MCL models.
arXiv Detail & Related papers (2021-09-02T19:24:03Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - A Better Alternative to Error Feedback for Communication-Efficient
Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions.
We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.