Improving Efficiency in Large-Scale Decentralized Distributed Training
- URL: http://arxiv.org/abs/2002.01119v1
- Date: Tue, 4 Feb 2020 04:29:09 GMT
- Title: Improving Efficiency in Large-Scale Decentralized Distributed Training
- Authors: Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler,
Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel
Das, David Kung, Michael Picheny
- Abstract summary: We propose techniques to accelerate (A)D-PSGD based training by improving the spectral gap while minimizing the communication cost.
We demonstrate the effectiveness of our proposed techniques by running experiments on the 2000-hour Switchboard speech recognition task and the ImageNet computer vision task.
- Score: 58.80224380923698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous
Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have
been demonstrated to perform well for large-scale deep learning tasks. One
drawback of (A)D-PSGD is that the spectral gap of the mixing matrix decreases
when the number of learners in the system increases, which hampers convergence.
In this paper, we investigate techniques to accelerate (A)D-PSGD based training
by improving the spectral gap while minimizing the communication cost. We
demonstrate the effectiveness of our proposed techniques by running experiments
on the 2000-hour Switchboard speech recognition task and the ImageNet computer
vision task. On an IBM P9 supercomputer, our system is able to train an LSTM
acoustic model in 2.28 hours with 7.5% WER on the Hub5-2000 Switchboard (SWB)
test set and 13.3% WER on the CallHome (CH) test set using 64 V100 GPUs and in
1.98 hours with 7.7% WER on SWB and 13.3% WER on CH using 128 V100 GPUs, the
fastest training time reported to date.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Low-Latency Cooperative Spectrum Sensing via Truncated Vertical
Federated Learning [51.51440623636274]
We propose a vertical federated learning (VFL) framework to exploit the distributed features across multiple secondary users (SUs) without compromising data privacy.
To accelerate the training process, we propose a truncated vertical federated learning (T-VFL) algorithm.
The convergence performance of T-VFL is provided via mathematical analysis and justified by simulation results.
arXiv Detail & Related papers (2022-08-07T10:39:27Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized
Stochastic Gradient Descent [37.52828820578212]
Distributed Deep Learning (DDL) is essential for large-scale Deep Learning (DL) training.
In a large batch setting, the learning rate must be increased to compensate for the reduced number of parameter updates.
Recently, Decentralized Parallel SGD (DPSGD) has been proposed to improve training speed.
arXiv Detail & Related papers (2021-12-02T17:23:25Z) - Asynchronous Decentralized Distributed Training of Acoustic Models [43.34839658423581]
We study three variants of asynchronous decentralized parallel SGD (ADPSGD)
We show that ADPSGD with fixed and randomized communication patterns cope well with slow learners.
In particular, using the delay-by-one strategy, we can train the acoustic model in less than 2 hours.
arXiv Detail & Related papers (2021-10-21T15:14:58Z) - Accelerating Distributed K-FAC with Smart Parallelism of Computing and
Communication Tasks [13.552262050816616]
Kronecker-Factored Approximate Curvature (KFAC) is one of the most efficient approximation algorithms for training deep models.
Yet, when leveraging GPU clusters to train models with KFAC, it incurs extensive computation as well as introduces extra communications during each iteration.
We propose D-KFAC with smart parallelism of computing and communication tasks to reduce the iteration time.
arXiv Detail & Related papers (2021-07-14T08:01:07Z) - Learning to Efficiently Sample from Diffusion Probabilistic Models [49.58748345998702]
Denoising Diffusion Probabilistic Models (DDPMs) can yield high-fidelity samples and competitive log-likelihoods across a range of domains.
We introduce an exact dynamic programming algorithm that finds the optimal discrete time schedules for any pre-trained DDPM.
arXiv Detail & Related papers (2021-06-07T17:15:07Z) - Adaptive Periodic Averaging: A Practical Approach to Reducing
Communication in Distributed Learning [6.370766463380455]
We show that the optimal averaging period in terms of convergence and communication cost is not a constant, but instead varies over the course of the execution.
We propose a practical algorithm, named ADaptive Periodic parameter averaging SGD (ADPSGD), to achieve a smaller overall variance of model parameters.
arXiv Detail & Related papers (2020-07-13T00:04:55Z) - DaSGD: Squeezing SGD Parallelization Performance in Distributed Training
Using Delayed Averaging [4.652668321425679]
Minibatch gradient descent (SGD) algorithm requires workers to halt forward/back propagations.
DaSGD parallelizes SGD and forward/back propagations to hide 100% of the communication overhead.
arXiv Detail & Related papers (2020-05-31T05:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.