Related papers: Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)

Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)

URL: http://arxiv.org/abs/2104.05588v2
Date: Thu, 15 Apr 2021 09:37:04 GMT
Title: Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)
Authors: Daniel Coquelin, Charlotte Debus, Markus G\"otz, Fabrice von der Lehr, James Kahn, Martin Siggel, and Achim Streit
Abstract summary: We introduce the Distributed Asynchronous and Selective Optimization (DASO) method to accelerate network training. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) to utilize large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations. This synchronization is the central algorithmic bottleneck. To combat this, we introduce the Distributed Asynchronous and Selective Optimization (DASO) method which leverages multi-GPU compute node architectures to accelerate network training. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to other existing data parallel training methods.

Related papers

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
From promise to practice: realizing high-performance decentralized training [8.955918346078935]
Decentralized training of deep neural networks has attracted significant attention for its theoretically superior scalability over synchronous data-parallel methods like All-Reduce. This paper identifies three key factors that can lead to speedups over All-Reduce training and constructs a runtime model to determine when, how, and to what degree decentralization can yield shorter per-it runtimes.
arXiv Detail & Related papers (2024-10-15T19:04:56Z)
Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse [56.384390765357004]
We propose an integrated federated split learning and hyperdimensional computing framework for emerging foundation models. This novel approach reduces communication costs, computation load, and privacy risks, making it suitable for resource-constrained edge devices in the Metaverse.
arXiv Detail & Related papers (2024-08-26T17:03:14Z)
Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging [1.4748100900619232]
Federated Dynamic Averaging (FDA) is a communication-efficient DDL strategy. FDA reduces communication cost by orders of magnitude, compared to both traditional and cutting-edge algorithms.
arXiv Detail & Related papers (2024-05-31T16:34:11Z)
Going Forward-Forward in Distributed Deep Learning [0.0]
We introduce a new approach in distributed deep learning, utilizing Geoffrey Hinton's Forward-Forward (FF) algorithm. Unlike traditional methods that rely on forward and backward passes, the FF algorithm employs a dual forward pass strategy. Our evaluation shows a 3.75 times speed up on MNIST dataset without compromising accuracy when training a four-layer network with four compute nodes.
arXiv Detail & Related papers (2024-03-30T16:02:53Z)
Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices [0.0]
Ravnest facilitates decentralized training by efficiently organizing compute nodes into clusters. We have framed our asynchronous SGD loss function as a block structured optimization problem with delayed updates.
arXiv Detail & Related papers (2024-01-03T13:07:07Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning [0.0]
Local Asynchronous SGD (LASGD) is an asynchronous decentralized algorithm that relies on All Reduce for model synchronization. We empirically validate LASGD's performance on image classification tasks on the ImageNet dataset.
arXiv Detail & Related papers (2022-03-24T14:25:15Z)
Unsupervised Learning for Asynchronous Resource Allocation in Ad-hoc Wireless Networks [122.42812336946756]
We design an unsupervised learning method based on Aggregation Graph Neural Networks (Agg-GNNs) We capture the asynchrony by modeling the activation pattern as a characteristic of each node and train a policy-based resource allocation method.
arXiv Detail & Related papers (2020-11-05T03:38:36Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.