Uncertainty Principle for Communication Compression in Distributed and
Federated Learning and the Search for an Optimal Compressor
- URL: http://arxiv.org/abs/2002.08958v3
- Date: Tue, 26 Jan 2021 11:13:35 GMT
- Title: Uncertainty Principle for Communication Compression in Distributed and
Federated Learning and the Search for an Optimal Compressor
- Authors: Mher Safaryan and Egor Shulgin and Peter Richt\'arik
- Abstract summary: We consider an unbiased compression method inspired by the Kashin representation of vectors, which we call em Kashin compression (KC).
KC enjoys a em dimension independent variance bound for which we derive an explicit formula even in the regime when only a few bits need to be communicate per each vector entry.
- Score: 5.09755285351264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to mitigate the high communication cost in distributed and federated
learning, various vector compression schemes, such as quantization,
sparsification and dithering, have become very popular. In designing a
compression method, one aims to communicate as few bits as possible, which
minimizes the cost per communication round, while at the same time attempting
to impart as little distortion (variance) to the communicated messages as
possible, which minimizes the adverse effect of the compression on the overall
number of communication rounds. However, intuitively, these two goals are
fundamentally in conflict: the more compression we allow, the more distorted
the messages become. We formalize this intuition and prove an {\em uncertainty
principle} for randomized compression operators, thus quantifying this
limitation mathematically, and {\em effectively providing asymptotically tight
lower bounds on what might be achievable with communication compression}.
Motivated by these developments, we call for the search for the optimal
compression operator. In an attempt to take a first step in this direction, we
consider an unbiased compression method inspired by the Kashin representation
of vectors, which we call {\em Kashin compression (KC)}. In contrast to all
previously proposed compression mechanisms, KC enjoys a {\em dimension
independent} variance bound for which we derive an explicit formula even in the
regime when only a few bits need to be communicate per each vector entry.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors.
We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers.
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z) - Optimal Compression of Unit Norm Vectors in the High Distortion Regime [30.6205706348233]
We investigate the method for compressing a unit norm vector into the minimum number of bits, while still allowing for some acceptable level of distortion in recovery.
Our study considers both biased and unbiased compression methods and determines the optimal compression rates.
While the results are a mix of new and known, they are compiled in this paper for completeness.
arXiv Detail & Related papers (2023-07-16T04:23:57Z) - Unbiased Compression Saves Communication in Distributed Optimization:
When and How Much? [22.701976655513043]
Communication compression can alleviate communication overhead by transmitting compressed gradients and model parameters.
It is unclear whether communication compression reduces the total communication cost.
We show that using independent unbiased compression can reduce the total communication cost by a factor of up to $Theta(sqrtminn, kappa)$ when all local smoothness constants are constrained.
arXiv Detail & Related papers (2023-05-25T17:51:23Z) - Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with
Communication Compression [33.217552987061474]
Communication compression is one of the most effective means of reducing communication.
Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication.
arXiv Detail & Related papers (2022-06-08T03:36:34Z) - EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck.
There are two classes of compression operators and separate algorithms making use of them.
We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Optimal Gradient Compression for Distributed and Federated Learning [9.711326718689492]
Communication between computing nodes in distributed learning is typically an unavoidable burden.
Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques.
In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error.
arXiv Detail & Related papers (2020-10-07T07:58:59Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.