Related papers: On Communication Compression for Distributed Optimization on Heterogeneous Data

On Communication Compression for Distributed Optimization on Heterogeneous Data

URL: http://arxiv.org/abs/2009.02388v2
Date: Tue, 22 Dec 2020 09:41:09 GMT
Title: On Communication Compression for Distributed Optimization on Heterogeneous Data
Authors: Sebastian U. Stich
Abstract summary: Lossy gradient compression has become a key tool to avoid the communication bottleneck in distributed training of machine learning models. We analyze the performance of two standard and general types of methods: (i) distributed quantized SGD with arbitrary unbiased quantizers and (ii) distributed SGD with error-feedback and biased compressors. Our results indicate that D-EF-SGD is much less affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if data-skewness is high.
Score: 28.197694894254305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two standard and general types of methods: (i) distributed quantized SGD (D-QSGD) with arbitrary unbiased quantizers and (ii) distributed SGD with error-feedback and biased compressors (D-EF-SGD) in the heterogeneous (non-iid) data setting. Our results indicate that D-EF-SGD is much less affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if data-skewness is high. We further study two alternatives that are not (or much less) affected by heterogenous data distributions: first, a recently proposed method that is effective on strongly convex problems, and secondly, we point out a more general approach that is applicable to linear compressors only but effective in all considered scenarios.

Related papers

Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs [2.218667838700643]
Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. Their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using differential equations.
arXiv Detail & Related papers (2025-02-24T09:39:17Z)
Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity [55.03958223190181]
We propose the first theoretically grounded accelerated algorithms utilizing unbiased and biased compression under data similarity. Our results are of record and confirmed by experiments on different average losses and datasets.
arXiv Detail & Related papers (2024-12-21T00:40:58Z)
ODDN: Addressing Unpaired Data Challenges in Open-World Deepfake Detection on Online Social Networks [51.03118447290247]
We propose the open-world deepfake detection network (ODDN), which comprises open-world data aggregation (ODA) and compression-discard gradient correction (CGC) ODA effectively aggregates correlations between compressed and raw samples through both fine-grained and coarse-grained analyses. CGC incorporates a compression-discard gradient correction to further enhance performance across diverse compression methods in online social networks (OSNs)
arXiv Detail & Related papers (2024-10-24T12:32:22Z)
Communication-Efficient Distributed Learning with Local Immediate Error Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm. LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
EControl: Fast Distributed Optimization with Compression and Error Control [8.624830915051021]
We propose EControl, a novel mechanism that can regulate the strength of the feedback signal. We show that EControl mitigates the naive implementation of our method and support our findings.
arXiv Detail & Related papers (2023-11-06T10:00:13Z)
A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex Models and Heterogeneous Data [0.261072980439312]
We propose a unified paradigm called U.MP, D-MP and GT-D, which provides a convergence guarantee for non general objectives. In theory we provide the convergence analysis objectives two approaches for these non-MP algorithms.
arXiv Detail & Related papers (2023-03-01T02:13:22Z)
DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem. We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z)
EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck. There are two classes of compression operators and separate algorithms making use of them. We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z)
Federated Random Reshuffling with Compression and Variance Reduction [0.0]
Random Reshuffling (RR) is an immensely popular method for training supervised machine learning models via empirical risk minimization. It is embedded and often set as default in standard machine learning software. We introduce three new algorithms to improve FedRR further: one for taming the variance coming from shuffling and the other for taming the variance due to compression.
arXiv Detail & Related papers (2022-05-08T16:46:11Z)
Distributed Methods with Absolute Compression and Error Compensation [1.52292571922932]
Communication compression is a powerful approach to alleviating this issue. In this paper, we generalize the analysis of EC-SGD with absolute compression to the arbitrary sampling strategy. Our rates improve upon the previously known ones in this setting.
arXiv Detail & Related papers (2022-03-04T15:41:14Z)
Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent. underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.