On Communication Compression for Distributed Optimization on
Heterogeneous Data
- URL: http://arxiv.org/abs/2009.02388v2
- Date: Tue, 22 Dec 2020 09:41:09 GMT
- Title: On Communication Compression for Distributed Optimization on
Heterogeneous Data
- Authors: Sebastian U. Stich
- Abstract summary: Lossy gradient compression has become a key tool to avoid the communication bottleneck in distributed training of machine learning models.
We analyze the performance of two standard and general types of methods: (i) distributed quantized SGD with arbitrary unbiased quantizers and (ii) distributed SGD with error-feedback and biased compressors.
Our results indicate that D-EF-SGD is much less affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if data-skewness is high.
- Score: 28.197694894254305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lossy gradient compression, with either unbiased or biased compressors, has
become a key tool to avoid the communication bottleneck in centrally
coordinated distributed training of machine learning models. We analyze the
performance of two standard and general types of methods: (i) distributed
quantized SGD (D-QSGD) with arbitrary unbiased quantizers and (ii) distributed
SGD with error-feedback and biased compressors (D-EF-SGD) in the heterogeneous
(non-iid) data setting. Our results indicate that D-EF-SGD is much less
affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if
data-skewness is high. We further study two alternatives that are not (or much
less) affected by heterogenous data distributions: first, a recently proposed
method that is effective on strongly convex problems, and secondly, we point
out a more general approach that is applicable to linear compressors only but
effective in all considered scenarios.
Related papers
- ODDN: Addressing Unpaired Data Challenges in Open-World Deepfake Detection on Online Social Networks [51.03118447290247]
We propose the open-world deepfake detection network (ODDN), which comprises open-world data aggregation (ODA) and compression-discard gradient correction (CGC)
ODA effectively aggregates correlations between compressed and raw samples through both fine-grained and coarse-grained analyses.
CGC incorporates a compression-discard gradient correction to further enhance performance across diverse compression methods in online social networks (OSNs)
arXiv Detail & Related papers (2024-10-24T12:32:22Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Data-Aware Gradient Compression for DML in Communication-Constrained Mobile Computing [20.70238092277094]
This work derives the convergence rate of distributed machine learning with non-uniform compression.
We propose DAGC-R, which assigns conservative compression to workers handling larger data volumes.
Our experiments confirm that the DAGC-A and DAGC-R can speed up the training speed by up to $16.65%$ and $25.43%$ respectively.
arXiv Detail & Related papers (2023-11-13T13:24:09Z) - EControl: Fast Distributed Optimization with Compression and Error
Control [8.624830915051021]
We propose EControl, a novel mechanism that can regulate the strength of the feedback signal.
We show that EControl mitigates the naive implementation of our method and support our findings.
arXiv Detail & Related papers (2023-11-06T10:00:13Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck.
There are two classes of compression operators and separate algorithms making use of them.
We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z) - Federated Random Reshuffling with Compression and Variance Reduction [0.0]
Random Reshuffling (RR) is an immensely popular method for training supervised machine learning models via empirical risk minimization.
It is embedded and often set as default in standard machine learning software.
We introduce three new algorithms to improve FedRR further: one for taming the variance coming from shuffling and the other for taming the variance due to compression.
arXiv Detail & Related papers (2022-05-08T16:46:11Z) - Distributed Methods with Absolute Compression and Error Compensation [1.52292571922932]
Communication compression is a powerful approach to alleviating this issue.
In this paper, we generalize the analysis of EC-SGD with absolute compression to the arbitrary sampling strategy.
Our rates improve upon the previously known ones in this setting.
arXiv Detail & Related papers (2022-03-04T15:41:14Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z) - On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings.
We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.