Adaptive Compression for Communication-Efficient Distributed Training
- URL: http://arxiv.org/abs/2211.00188v1
- Date: Mon, 31 Oct 2022 23:09:01 GMT
- Title: Adaptive Compression for Communication-Efficient Distributed Training
- Authors: Maksim Makarenko, Elnur Gasanov, Rustem Islamov, Abdurakhmon Sadiev,
Peter Richtarik
- Abstract summary: We propose a novel algorithm for communication-training of supervised machine learning models with adaptive compression level.
Our approach is inspired by the recently proposed three point compressor (3PC) of Richtarik et al.
We extend the 3PC framework to bidirectional compression, i.e., we allow the server to compress as well.
- Score: 3.1148846501645084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel
optimization algorithm for communication-efficient training of supervised
machine learning models with adaptive compression level. Our approach is
inspired by the recently proposed three point compressor (3PC) framework of
Richtarik et al. (2022), which includes error feedback (EF21), lazily
aggregated gradient (LAG), and their combination as special cases, and offers
the current state-of-the-art rates for these methods under weak assumptions.
While the above mechanisms offer a fixed compression level, or adapt between
two extremes only, our proposal is to perform a much finer adaptation. In
particular, we allow the user to choose any number of arbitrarily chosen
contractive compression mechanisms, such as Top-K sparsification with a
user-defined selection of sparsification levels K, or quantization with a
user-defined selection of quantization levels, or their combination. AdaCGD
chooses the appropriate compressor and compression level adaptively during the
optimization process. Besides i) proposing a theoretically-grounded
multi-adaptive communication compression mechanism, we further ii) extend the
3PC framework to bidirectional compression, i.e., we allow the server to
compress as well, and iii) provide sharp convergence bounds in the strongly
convex, convex and nonconvex settings. The convex regime results are new even
for several key special cases of our general mechanism, including 3PC and EF21.
In all regimes, our rates are superior compared to all existing adaptive
compression methods.
Related papers
- Fast Feedforward 3D Gaussian Splatting Compression [55.149325473447384]
3D Gaussian Splatting (FCGS) is an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass.
FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods.
arXiv Detail & Related papers (2024-10-10T15:13:08Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors.
We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers.
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z) - Lower Bounds and Accelerated Algorithms in Distributed Stochastic
Optimization with Communication Compression [31.107056382542417]
Communication compression is an essential strategy for alleviating communication overhead.
We propose NEOLITHIC, a nearly optimal algorithm for compression under mild conditions.
arXiv Detail & Related papers (2023-05-12T17:02:43Z) - Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment.
Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z) - Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning.
Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z) - Distributed Newton-Type Methods with Communication Compression and
Bernoulli Aggregation [11.870393751095083]
We study ommunication compression and aggregation mechanisms for curvature information.
New 3PC mechanisms, such as adaptive thresholding and Bernoulli aggregation, require reduced communication and occasional Hessian computations.
For all our methods, we derive fast condition-number-independent local linear and/or superlinear convergence rates.
arXiv Detail & Related papers (2022-06-07T21:12:21Z) - 3PC: Three Point Compressors for Communication-Efficient Distributed
Training and a Better Theory for Lazy Aggregation [12.013162443721312]
We propose a new class of gradient communication mechanisms for communication-efficient training.
We show that our approach can recover the recently proposed state-of-the-art error feedback mechanism EF21.
We provide a new fundamental link between the lazy aggregation and error feedback literature.
arXiv Detail & Related papers (2022-02-02T12:34:18Z) - Remote Multilinear Compressive Learning with Adaptive Compression [107.87219371697063]
MultiIoT Compressive Learning (MCL) is an efficient signal acquisition and learning paradigm for multidimensional signals.
We propose a novel optimization scheme that enables such a feature for MCL models.
arXiv Detail & Related papers (2021-09-02T19:24:03Z) - Innovation Compression for Communication-efficient Distributed
Optimization with Linear Convergence [23.849813231750932]
This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems.
By compressing innovation vectors, COLD is able to achieve linear convergence for a class of $delta$-contracted compressors.
Numerical experiments demonstrate the advantages of both algorithms under different compressors.
arXiv Detail & Related papers (2021-05-14T08:15:18Z) - GAN Slimming: All-in-One GAN Compression by A Unified Optimization
Framework [94.26938614206689]
We propose the first unified optimization framework combining multiple compression means for GAN compression, dubbed GAN Slimming.
We apply GS to compress CartoonGAN, a state-of-the-art style transfer network, by up to 47 times, with minimal visual quality degradation.
arXiv Detail & Related papers (2020-08-25T14:39:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.