EControl: Fast Distributed Optimization with Compression and Error
Control
- URL: http://arxiv.org/abs/2311.05645v1
- Date: Mon, 6 Nov 2023 10:00:13 GMT
- Title: EControl: Fast Distributed Optimization with Compression and Error
Control
- Authors: Yuan Gao and Rustem Islamov and Sebastian Stich
- Abstract summary: We propose EControl, a novel mechanism that can regulate the strength of the feedback signal.
We show that EControl mitigates the naive implementation of our method and support our findings.
- Score: 8.624830915051021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern distributed training relies heavily on communication compression to
reduce the communication overhead. In this work, we study algorithms employing
a popular class of contractive compressors in order to reduce communication
overhead. However, the naive implementation often leads to unstable convergence
or even exponential divergence due to the compression bias. Error Compensation
(EC) is an extremely popular mechanism to mitigate the aforementioned issues
during the training of models enhanced by contractive compression operators.
Compared to the effectiveness of EC in the data homogeneous regime, the
understanding of the practicality and theoretical foundations of EC in the data
heterogeneous regime is limited. Existing convergence analyses typically rely
on strong assumptions such as bounded gradients, bounded data heterogeneity, or
large batch accesses, which are often infeasible in modern machine learning
applications. We resolve the majority of current issues by proposing EControl,
a novel mechanism that can regulate error compensation by controlling the
strength of the feedback signal. We prove fast convergence for EControl in
standard strongly convex, general convex, and nonconvex settings without any
additional assumptions on the problem or data heterogeneity. We conduct
extensive numerical evaluations to illustrate the efficacy of our method and
support our theoretical findings.
Related papers
- Causal Context Adjustment Loss for Learned Image Compression [72.7300229848778]
In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance.
Most present techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context.
In this paper, we make the first attempt in investigating the way to explicitly adjust the causal context with our proposed Causal Context Adjustment loss.
arXiv Detail & Related papers (2024-10-07T09:08:32Z) - Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning [15.78336840511033]
This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios.
Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates.
We employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity.
arXiv Detail & Related papers (2024-08-25T09:30:34Z) - Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Distributed Methods with Compressed Communication for Solving
Variational Inequalities, with Theoretical Guarantees [115.08148491584997]
We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2.
New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
arXiv Detail & Related papers (2021-10-07T10:04:32Z) - Compressing gradients by exploiting temporal correlation in momentum-SGD [17.995905582226463]
We analyze compression methods that exploit temporal correlation in systems with and without error-feedback.
Experiments with the ImageNet dataset demonstrate that our proposed methods offer significant reduction in the rate of communication.
We prove the convergence of SGD under an expected error assumption by establishing a bound for the minimum gradient norm.
arXiv Detail & Related papers (2021-08-17T18:04:06Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - On Communication Compression for Distributed Optimization on
Heterogeneous Data [28.197694894254305]
Lossy gradient compression has become a key tool to avoid the communication bottleneck in distributed training of machine learning models.
We analyze the performance of two standard and general types of methods: (i) distributed quantized SGD with arbitrary unbiased quantizers and (ii) distributed SGD with error-feedback and biased compressors.
Our results indicate that D-EF-SGD is much less affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if data-skewness is high.
arXiv Detail & Related papers (2020-09-04T20:48:08Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z) - A Better Alternative to Error Feedback for Communication-Efficient
Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions.
We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.