Communication-Efficient Distributed Learning with Local Immediate Error
Compensation
- URL: http://arxiv.org/abs/2402.11857v1
- Date: Mon, 19 Feb 2024 05:59:09 GMT
- Title: Communication-Efficient Distributed Learning with Local Immediate Error
Compensation
- Authors: Yifei Cheng, Li Shen, Linli Xu, Xun Qian, Shiwei Wu, Yiming Zhou, Tie
Zhang, Dacheng Tao, Enhong Chen
- Abstract summary: We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
- Score: 95.6828475028581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient compression with error compensation has attracted significant
attention with the target of reducing the heavy communication overhead in
distributed learning. However, existing compression methods either perform only
unidirectional compression in one iteration with higher communication cost, or
bidirectional compression with slower convergence rate. In this work, we
propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization
algorithm to break the above bottlenecks based on bidirectional compression and
carefully designed compensation approaches. Specifically, the bidirectional
compression technique is to reduce the communication cost, and the compensation
technique compensates the local compression error to the model update
immediately while only maintaining the global error variable on the server
throughout the iterations to boost its efficacy. Theoretically, we prove that
LIEC-SGD is superior to previous works in either the convergence rate or the
communication cost, which indicates that LIEC-SGD could inherit the dual
advantages from unidirectional compression and bidirectional compression.
Finally, experiments of training deep neural networks validate the
effectiveness of the proposed LIEC-SGD algorithm.
Related papers
- EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation [79.56709262189953]
EoRA consistently outperforms previous methods in compensating errors for compressed LLaMA2/3 models on various tasks.
EoRA offers a scalable, training-free solution to compensate for compression errors.
arXiv Detail & Related papers (2024-10-28T17:59:03Z) - Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - EControl: Fast Distributed Optimization with Compression and Error
Control [8.624830915051021]
We propose EControl, a novel mechanism that can regulate the strength of the feedback signal.
We show that EControl mitigates the naive implementation of our method and support our findings.
arXiv Detail & Related papers (2023-11-06T10:00:13Z) - Adaptive Top-K in SGD for Communication-Efficient Distributed Learning [14.867068493072885]
This paper proposes a novel adaptive Top-K in SGD framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance.
numerical results on the MNIST and CIFAR-10 datasets demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-10-24T18:33:35Z) - Optimal Rate Adaption in Federated Learning with Compressed
Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates.
tradeoff between compression and model accuracy in the networked environment remains unclear.
We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z) - Innovation Compression for Communication-efficient Distributed
Optimization with Linear Convergence [23.849813231750932]
This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems.
By compressing innovation vectors, COLD is able to achieve linear convergence for a class of $delta$-contracted compressors.
Numerical experiments demonstrate the advantages of both algorithms under different compressors.
arXiv Detail & Related papers (2021-05-14T08:15:18Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.