Related papers: Distributed Methods with Absolute Compression and Error Compensation

Distributed Methods with Absolute Compression and Error Compensation

URL: http://arxiv.org/abs/2203.02383v2
Date: Sun, 29 May 2022 14:33:09 GMT
Title: Distributed Methods with Absolute Compression and Error Compensation
Authors: Marina Danilova, Eduard Gorbunov
Abstract summary: Communication compression is a powerful approach to alleviating this issue. In this paper, we generalize the analysis of EC-SGD with absolute compression to the arbitrary sampling strategy. Our rates improve upon the previously known ones in this setting.
Score: 1.52292571922932
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients, iterates, is prohibitively expensive, especially when the number of workers is large. Communication compression is a powerful approach to alleviating this issue, and, in particular, methods with biased compression and error compensation are extremely popular due to their practical efficiency. Sahu et al. (2021) propose a new analysis of Error Compensated SGD (EC-SGD) for the class of absolute compression operators showing that in a certain sense, this class contains optimal compressors for EC-SGD. However, the analysis was conducted only under the so-called $(M,\sigma^2)$-bounded noise assumption. In this paper, we generalize the analysis of EC-SGD with absolute compression to the arbitrary sampling strategy and propose the first analysis of Error Compensated Loopless Stochastic Variance Reduced Gradient method (EC-LSVRG) with absolute compression for (strongly) convex problems. Our rates improve upon the previously known ones in this setting. Numerical experiments corroborate our theoretical findings.

Related papers

Optimal Compression of Unit Norm Vectors in the High Distortion Regime [30.6205706348233]
We investigate the method for compressing a unit norm vector into the minimum number of bits, while still allowing for some acceptable level of distortion in recovery. Our study considers both biased and unbiased compression methods and determines the optimal compression rates. While the results are a mix of new and known, they are compiled in this paper for completeness.
arXiv Detail & Related papers (2023-07-16T04:23:57Z)
Shifted Compression Framework: Generalizations and Improvements [2.2147691173934967]
Communication is one of the key bottlenecks in the distributed training of large-scale machine learning models. Lossy compression of exchanged information, such as gradients or models, is one of the most effective instruments to alleviate this issue.
arXiv Detail & Related papers (2022-06-21T15:00:04Z)
EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck. There are two classes of compression operators and separate algorithms making use of them. We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z)
Federated Random Reshuffling with Compression and Variance Reduction [0.0]
Random Reshuffling (RR) is an immensely popular method for training supervised machine learning models via empirical risk minimization. It is embedded and often set as default in standard machine learning software. We introduce three new algorithms to improve FedRR further: one for taming the variance coming from shuffling and the other for taming the variance due to compression.
arXiv Detail & Related papers (2022-05-08T16:46:11Z)
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression [151.3826781154146]
latent variables with priors and hyperpriors is an essential problem in variational image compression. We find inter-correlations and intra-correlations exist when observing latent variables in a vectorized perspective. Our model has better rate-distortion performance and an impressive $3.18times$ compression speed up.
arXiv Detail & Related papers (2022-03-21T11:44:17Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
On Communication Compression for Distributed Optimization on Heterogeneous Data [28.197694894254305]
Lossy gradient compression has become a key tool to avoid the communication bottleneck in distributed training of machine learning models. We analyze the performance of two standard and general types of methods: (i) distributed quantized SGD with arbitrary unbiased quantizers and (ii) distributed SGD with error-feedback and biased compressors. Our results indicate that D-EF-SGD is much less affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if data-skewness is high.
arXiv Detail & Related papers (2020-09-04T20:48:08Z)
Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent. underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework. We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z)
Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives [50.12802772165797]
The averaging technique combines all iterative solutions into a single solution. Experiments have demonstrated trade-off and the effectiveness of averaging compared with other averaging schemes.
arXiv Detail & Related papers (2020-03-09T18:14:00Z)
On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.