Unbiased Compression Saves Communication in Distributed Optimization:
When and How Much?
- URL: http://arxiv.org/abs/2305.16297v3
- Date: Wed, 10 Jan 2024 18:55:27 GMT
- Title: Unbiased Compression Saves Communication in Distributed Optimization:
When and How Much?
- Authors: Yutong He, Xinmeng Huang, Kun Yuan
- Abstract summary: Communication compression can alleviate communication overhead by transmitting compressed gradients and model parameters.
It is unclear whether communication compression reduces the total communication cost.
We show that using independent unbiased compression can reduce the total communication cost by a factor of up to $Theta(sqrtminn, kappa)$ when all local smoothness constants are constrained.
- Score: 22.701976655513043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Communication compression is a common technique in distributed optimization
that can alleviate communication overhead by transmitting compressed gradients
and model parameters. However, compression can introduce information
distortion, which slows down convergence and incurs more communication rounds
to achieve desired solutions. Given the trade-off between lower per-round
communication costs and additional rounds of communication, it is unclear
whether communication compression reduces the total communication cost.
This paper explores the conditions under which unbiased compression, a widely
used form of compression, can reduce the total communication cost, as well as
the extent to which it can do so. To this end, we present the first theoretical
formulation for characterizing the total communication cost in distributed
optimization with communication compression. We demonstrate that unbiased
compression alone does not necessarily save the total communication cost, but
this outcome can be achieved if the compressors used by all workers are further
assumed independent. We establish lower bounds on the communication rounds
required by algorithms using independent unbiased compressors to minimize
smooth convex functions and show that these lower bounds are tight by refining
the analysis for ADIANA. Our results reveal that using independent unbiased
compression can reduce the total communication cost by a factor of up to
$\Theta(\sqrt{\min\{n, \kappa\}})$ when all local smoothness constants are
constrained by a common upper bound, where $n$ is the number of workers and
$\kappa$ is the condition number of the functions being minimized. These
theoretical findings are supported by experimental results.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors.
We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers.
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z) - Stochastic Controlled Averaging for Federated Learning with Communication Compression [36.09135695005069]
We propose two compressed FL algorithms, SCALLION and SCAFCOM, to support unbiased and biased compression.
Both the proposed methods outperform the existing compressed FL methods in terms of communication and computation complexities.
arXiv Detail & Related papers (2023-08-16T06:35:36Z) - Lower Bounds and Accelerated Algorithms in Distributed Stochastic
Optimization with Communication Compression [31.107056382542417]
Communication compression is an essential strategy for alleviating communication overhead.
We propose NEOLITHIC, a nearly optimal algorithm for compression under mild conditions.
arXiv Detail & Related papers (2023-05-12T17:02:43Z) - Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with
Communication Compression [33.217552987061474]
Communication compression is one of the most effective means of reducing communication.
Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication.
arXiv Detail & Related papers (2022-06-08T03:36:34Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings.
We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z) - Uncertainty Principle for Communication Compression in Distributed and
Federated Learning and the Search for an Optimal Compressor [5.09755285351264]
We consider an unbiased compression method inspired by the Kashin representation of vectors, which we call em Kashin compression (KC).
KC enjoys a em dimension independent variance bound for which we derive an explicit formula even in the regime when only a few bits need to be communicate per each vector entry.
arXiv Detail & Related papers (2020-02-20T17:20:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.