Compressing gradients by exploiting temporal correlation in momentum-SGD
- URL: http://arxiv.org/abs/2108.07827v1
- Date: Tue, 17 Aug 2021 18:04:06 GMT
- Title: Compressing gradients by exploiting temporal correlation in momentum-SGD
- Authors: Tharindu B. Adikari, Stark C. Draper
- Abstract summary: We analyze compression methods that exploit temporal correlation in systems with and without error-feedback.
Experiments with the ImageNet dataset demonstrate that our proposed methods offer significant reduction in the rate of communication.
We prove the convergence of SGD under an expected error assumption by establishing a bound for the minimum gradient norm.
- Score: 17.995905582226463
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: An increasing bottleneck in decentralized optimization is communication.
Bigger models and growing datasets mean that decentralization of computation is
important and that the amount of information exchanged is quickly growing.
While compression techniques have been introduced to cope with the latter, none
has considered leveraging the temporal correlations that exist in consecutive
vector updates. An important example is distributed momentum-SGD where temporal
correlation is enhanced by the low-pass-filtering effect of applying momentum.
In this paper we design and analyze compression methods that exploit temporal
correlation in systems both with and without error-feedback. Experiments with
the ImageNet dataset demonstrate that our proposed methods offer significant
reduction in the rate of communication at only a negligible increase in
computation complexity. We further analyze the convergence of SGD when
compression is applied with error-feedback. In the literature, convergence
guarantees are developed only for compressors that provide error-bounds
point-wise, i.e., for each input to the compressor. In contrast, many important
codes (e.g. rate-distortion codes) provide error-bounds only in expectation and
thus provide a more general guarantee. In this paper we prove the convergence
of SGD under an expected error assumption by establishing a bound for the
minimum gradient norm.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - EControl: Fast Distributed Optimization with Compression and Error
Control [8.624830915051021]
We propose EControl, a novel mechanism that can regulate the strength of the feedback signal.
We show that EControl mitigates the naive implementation of our method and support our findings.
arXiv Detail & Related papers (2023-11-06T10:00:13Z) - Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning [47.904127007515925]
We study a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction.
We prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic approximation guarantees as their counterparts.
Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling.
arXiv Detail & Related papers (2023-01-03T04:09:38Z) - Unified Multivariate Gaussian Mixture for Efficient Neural Image
Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression.
We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective.
Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z) - Innovation Compression for Communication-efficient Distributed
Optimization with Linear Convergence [23.849813231750932]
This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems.
By compressing innovation vectors, COLD is able to achieve linear convergence for a class of $delta$-contracted compressors.
Numerical experiments demonstrate the advantages of both algorithms under different compressors.
arXiv Detail & Related papers (2021-05-14T08:15:18Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.