AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning
- URL: http://arxiv.org/abs/2405.01067v2
- Date: Sun, 30 Jun 2024 08:06:33 GMT
- Title: AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning
- Authors: Daniel Coquelin, Katherina Flügel, Marie Weiel, Nicholas Kiefer, Muhammed Öz, Charlotte Debus, Achim Streit, Markus Götz,
- Abstract summary: AB-training is a novel data-parallel method that leverages low-rank representations and independent training groups to reduce communication overhead.
Our experiments demonstrate an average reduction in network traffic of approximately 70.31% across various scaling scenarios.
- Score: 0.07227323884094951
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Communication bottlenecks severely hinder the scalability of distributed neural network training, particularly in high-performance computing (HPC) environments. We introduce AB-training, a novel data-parallel method that leverages low-rank representations and independent training groups to significantly reduce communication overhead. Our experiments demonstrate an average reduction in network traffic of approximately 70.31\% across various scaling scenarios, increasing the training potential of communication-constrained systems and accelerating convergence at scale. AB-training also exhibits a pronounced regularization effect at smaller scales, leading to improved generalization while maintaining or even reducing training time. We achieve a remarkable 44.14 : 1 compression ratio on VGG16 trained on CIFAR-10 with minimal accuracy loss, and outperform traditional data parallel training by 1.55\% on ResNet-50 trained on ImageNet-2012. While AB-training is promising, our findings also reveal that large batch effects persist even in low-rank regimes, underscoring the need for further research into optimized update mechanisms for massively distributed training.
Related papers
- On Using Large-Batches in Federated Learning [1.5229257192293202]
Federated learning is crucial for training deep networks over devices with limited compute resources and bounded networks.<n>This work proposes our vision of exploiting the trade-offs between small and large-batch training.<n>For the same number of iterations, we observe that our proposed large-batch training technique attains about 32.33% and 3.74% higher test accuracy than small-batch training.
arXiv Detail & Related papers (2025-09-05T17:31:50Z) - Accelerating Privacy-Preserving Federated Learning in Large-Scale LEO Satellite Systems [57.692181589325116]
Large-scale low-Earth-orbit (LEO) satellite systems are increasingly valued for their ability to enable rapid and wide-area data exchange.<n>Due to privacy concerns and regulatory constraints, raw data collected at remote clients cannot be centrally aggregated.<n>Federated learning offers a privacy-preserving alternative by training local models on distributed devices and exchanging only model parameters.<n>We propose a discrete temporal graph-based on-demand scheduling framework that dynamically allocates communication resources to accelerate federated learning.
arXiv Detail & Related papers (2025-09-05T03:33:42Z) - Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward [54.708851958671794]
We propose a Data-Efficient Policy Optimization pipeline that combines optimized strategies for both offline and online data selection.<n>In offline phase, we curate a high-quality subset of training samples based on diversity, influence, and appropriate difficulty.<n>During online RLVR training, we introduce a sample-level explorability metric to dynamically filter samples with low exploration potential.
arXiv Detail & Related papers (2025-09-01T10:04:20Z) - Learn2Mix: Training Neural Networks Using Adaptive Data Integration [24.082008483056462]
learn2mix is a novel training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates.
Empirical evaluations conducted on benchmark datasets show that neural networks trained with learn2mix converge faster than those trained with existing approaches.
arXiv Detail & Related papers (2024-12-21T04:40:07Z) - Balanced Training for Sparse GANs [16.045866864231417]
We propose a novel metric called the balance ratio (BR) to study the balance between the sparse generator and discriminator.
We also introduce a new method called balanced dynamic sparse training (ADAPT), which seeks to control the BR during GAN training to achieve a good trade-off between performance and computational cost.
arXiv Detail & Related papers (2023-02-28T15:34:01Z) - Efficient and Effective Augmentation Strategy for Adversarial Training [48.735220353660324]
Adversarial training of Deep Neural Networks is known to be significantly more data-hungry than standard training.
We propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training.
arXiv Detail & Related papers (2022-10-27T10:59:55Z) - Efficient Augmentation for Imbalanced Deep Learning [8.38844520504124]
We study a convolutional neural network's internal representation of imbalanced image data.
We measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes.
This insight enables us to design an efficient three-phase CNN training framework for imbalanced data.
arXiv Detail & Related papers (2022-07-13T09:43:17Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - Communication-Efficient Federated Learning with Dual-Side Low-Rank
Compression [8.353152693578151]
Federated learning (FL) is a promising and powerful approach for training deep learning models without sharing the raw data of clients.
We propose a new training method, referred to as federated learning with dual-side low-rank compression (FedDLR)
We show that FedDLR outperforms the state-of-the-art solutions in terms of both the communication and efficiency.
arXiv Detail & Related papers (2021-04-26T09:13:31Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Data optimization for large batch distributed training of deep neural
networks [0.19336815376402716]
Current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale.
We propose a data optimization approach that utilize machine learning to implicitly smooth out the loss landscape resulting in fewer local minima.
Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.
arXiv Detail & Related papers (2020-12-16T21:22:02Z) - CosSGD: Nonlinear Quantization for Communication-efficient Federated
Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server.
We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning.
Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.