Related papers: Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

URL: http://arxiv.org/abs/2008.05823v3
Date: Mon, 24 Jan 2022 15:49:53 GMT
Title: Step-Ahead Error Feedback for Distributed Training with Compressed Gradient
Authors: An Xu, Zhouyuan Huo, Heng Huang
Abstract summary: We show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training. We propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis.
Score: 99.42912552638168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although the distributed machine learning methods can speed up the training of large deep neural networks, the communication cost has become the non-negligible bottleneck to constrain the performance. To address this challenge, the gradient compression based communication-efficient distributed learning methods were designed to reduce the communication cost, and more recently the local error feedback was incorporated to compensate for the corresponding performance loss. However, in this paper, we will show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training and can lead to degraded performance compared with full-precision training. To solve this critical problem, we propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis. Both our theoretical and empirical results show that our new methods can handle the "gradient mismatch" problem. The experimental results show that we can even train faster with common gradient compression schemes than both the full-precision training and local error feedback regarding the training epochs and without performance loss.

Related papers

Learn2Mix: Training Neural Networks Using Adaptive Data Integration [24.082008483056462]
learn2mix is a novel training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Empirical evaluations conducted on benchmark datasets show that neural networks trained with learn2mix converge faster than those trained with existing approaches.
arXiv Detail & Related papers (2024-12-21T04:40:07Z)
Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity [55.03958223190181]
We propose the first theoretically grounded accelerated algorithms utilizing unbiased and biased compression under data similarity. Our results are of record and confirmed by experiments on different average losses and datasets.
arXiv Detail & Related papers (2024-12-21T00:40:58Z)
Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images. textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z)
Simplicity bias and optimization threshold in two-layer ReLU networks [24.43739371803548]
We show that despite overparametrization, networks converge toward simpler solutions rather than interpolating the training data. Our analysis relies on the so called early alignment phase, during which neurons align towards specific directions.
arXiv Detail & Related papers (2024-10-03T09:58:57Z)
Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning [15.78336840511033]
This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios. Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates. We employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity.
arXiv Detail & Related papers (2024-08-25T09:30:34Z)
Sparse-ProxSkip: Accelerated Sparse-to-Sparse Training in Federated Learning [56.21666819468249]
In Federated Learning (FL), both client resource constraints and communication costs pose major problems for training large models.<n>Recent work has shown that local training provably improves communication complexity through acceleration.<n>We introduce Sparse-ProxSkip, addressing the issue and implementing the efficient technique of Straight-Through Estorimat pruning into sparse training.
arXiv Detail & Related papers (2024-05-31T05:21:12Z)
Communication-Efficient Distributed Learning with Local Immediate Error Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm. LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z)
Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction [29.61757744974324]
Deep neural networks require significant memory and computation costs. Sparse training is one of the most common techniques to reduce these costs. In this work, we aim to overcome this problem and achieve space-time co-efficiency.
arXiv Detail & Related papers (2023-01-09T18:50:03Z)
Adversarial Coreset Selection for Efficient Robust Training [11.510009152620666]
We show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times.
arXiv Detail & Related papers (2022-09-13T07:37:53Z)
Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z)
Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training. We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy. Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z)
Balanced Softmax Cross-Entropy for Incremental Learning [6.5423218639215275]
Deep neural networks are prone to catastrophic forgetting when incrementally trained on new classes or new tasks. Recent methods has proven to be effective to mitigate catastrophic forgetting. We propose the use of the Balanced Softmax Cross-Entropy loss and show that it can be combined with exiting methods for incremental learning to improve their performances.
arXiv Detail & Related papers (2021-03-23T13:30:26Z)
Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective. Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.