Related papers: Communication-Efficient Distributed Learning with Local Immediate Error Compensation

Communication-Efficient Distributed Learning with Local Immediate Error Compensation

URL: http://arxiv.org/abs/2402.11857v1
Date: Mon, 19 Feb 2024 05:59:09 GMT
Title: Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Authors: Yifei Cheng, Li Shen, Linli Xu, Xun Qian, Shiwei Wu, Yiming Zhou, Tie Zhang, Dacheng Tao, Enhong Chen
Abstract summary: We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm. LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
Score: 95.6828475028581
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional compression in one iteration with higher communication cost, or bidirectional compression with slower convergence rate. In this work, we propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm to break the above bottlenecks based on bidirectional compression and carefully designed compensation approaches. Specifically, the bidirectional compression technique is to reduce the communication cost, and the compensation technique compensates the local compression error to the model update immediately while only maintaining the global error variable on the server throughout the iterations to boost its efficacy. Theoretically, we prove that LIEC-SGD is superior to previous works in either the convergence rate or the communication cost, which indicates that LIEC-SGD could inherit the dual advantages from unidirectional compression and bidirectional compression. Finally, experiments of training deep neural networks validate the effectiveness of the proposed LIEC-SGD algorithm.

Related papers

Accelerated Distributed Optimization with Compression and Error Feedback [22.94016026311574]
ADEF integrates Nesterov acceleration, contractive compression, error feedback, and gradient difference compression. We prove that ADEF achieves the first accelerated convergence rate for distributed optimization with contractive compression.
arXiv Detail & Related papers (2025-03-11T13:40:34Z)
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation [79.56709262189953]
EoRA consistently outperforms previous methods in compensating errors for compressed LLaMA2/3 models on various tasks. EoRA offers a scalable, training-free solution to compensate for compression errors.
arXiv Detail & Related papers (2024-10-28T17:59:03Z)
Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback. We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate. The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z)
EControl: Fast Distributed Optimization with Compression and Error Control [8.624830915051021]
We propose EControl, a novel mechanism that can regulate the strength of the feedback signal. We show that EControl mitigates the naive implementation of our method and support our findings.
arXiv Detail & Related papers (2023-11-06T10:00:13Z)
Adaptive Top-K in SGD for Communication-Efficient Distributed Learning [14.867068493072885]
This paper proposes a novel adaptive Top-K in SGD framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance. numerical results on the MNIST and CIFAR-10 datasets demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-10-24T18:33:35Z)
Optimal Rate Adaption in Federated Learning with Compressed Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates. tradeoff between compression and model accuracy in the networked environment remains unclear. We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z)
Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence [23.849813231750932]
This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems. By compressing innovation vectors, COLD is able to achieve linear convergence for a class of $delta$-contracted compressors. Numerical experiments demonstrate the advantages of both algorithms under different compressors.
arXiv Detail & Related papers (2021-05-14T08:15:18Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.