Related papers: 3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

URL: http://arxiv.org/abs/2202.00998v1
Date: Wed, 2 Feb 2022 12:34:18 GMT
Title: 3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation
Authors: Peter Richt\'arik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov
Abstract summary: We propose a new class of gradient communication mechanisms for communication-efficient training. We show that our approach can recover the recently proposed state-of-the-art error feedback mechanism EF21. We provide a new fundamental link between the lazy aggregation and error feedback literature.
Score: 12.013162443721312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them. Unlike most established approaches, which rely on a static compressor choice (e.g., Top-$K$), our class allows the compressors to {\em evolve} throughout the training process, with the aim of improving the theoretical communication complexity and practical efficiency of the underlying methods. We show that our general approach can recover the recently proposed state-of-the-art error feedback mechanism EF21 (Richt\'arik et al., 2021) and its theoretical properties as a special case, but also leads to a number of new efficient methods. Notably, our approach allows us to improve upon the state of the art in the algorithmic and theoretical foundations of the {\em lazy aggregation} literature (Chen et al., 2018). As a by-product that may be of independent interest, we provide a new and fundamental link between the lazy aggregation and error feedback literature. A special feature of our work is that we do not require the compressors to be unbiased.

Related papers

Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors. We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers. We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z)
Adaptive Compression for Communication-Efficient Distributed Training [3.1148846501645084]
We propose a novel algorithm for communication-training of supervised machine learning models with adaptive compression level. Our approach is inspired by the recently proposed three point compressor (3PC) of Richtarik et al. We extend the 3PC framework to bidirectional compression, i.e., we allow the server to compress as well.
arXiv Detail & Related papers (2022-10-31T23:09:01Z)
Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation [11.870393751095083]
We study ommunication compression and aggregation mechanisms for curvature information. New 3PC mechanisms, such as adaptive thresholding and Bernoulli aggregation, require reduced communication and occasional Hessian computations. For all our methods, we derive fast condition-number-independent local linear and/or superlinear convergence rates.
arXiv Detail & Related papers (2022-06-07T21:12:21Z)
EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck. There are two classes of compression operators and separate algorithms making use of them. We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z)
Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees [115.08148491584997]
We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
arXiv Detail & Related papers (2021-10-07T10:04:32Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
Remote Multilinear Compressive Learning with Adaptive Compression [107.87219371697063]
MultiIoT Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals. We propose a novel optimization scheme that enables such a feature for MCL models.
arXiv Detail & Related papers (2021-09-02T19:24:03Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions. We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z)
Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression. The proposed method automatically induces structured sparsity on the convolutional weights. We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.