Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep
Learning
- URL: http://arxiv.org/abs/2203.13085v1
- Date: Thu, 24 Mar 2022 14:25:15 GMT
- Title: Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep
Learning
- Authors: Tomer Avidor, Nadav Tal Israel
- Abstract summary: Local Asynchronous SGD (LASGD) is an asynchronous decentralized algorithm that relies on All Reduce for model synchronization.
We empirically validate LASGD's performance on image classification tasks on the ImageNet dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed training algorithms of deep neural networks show impressive
convergence speedup properties on very large problems. However, they inherently
suffer from communication related slowdowns and communication topology becomes
a crucial design choice. Common approaches supported by most machine learning
frameworks are: 1) Synchronous decentralized algorithms relying on a
peer-to-peer All Reduce topology that is sensitive to stragglers and
communication delays. 2) Asynchronous centralised algorithms with a server
based topology that is prone to communication bottleneck. Researchers also
suggested asynchronous decentralized algorithms designed to avoid the
bottleneck and speedup training, however, those commonly use inexact sparse
averaging that may lead to a degradation in accuracy. In this paper, we propose
Local Asynchronous SGD (LASGD), an asynchronous decentralized algorithm that
relies on All Reduce for model synchronization.
We empirically validate LASGD's performance on image classification tasks on
the ImageNet dataset. Our experiments demonstrate that LASGD accelerates
training compared to SGD and state of the art gossip based approaches.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Queuing dynamics of asynchronous Federated Learning [15.26212962081762]
We study asynchronous federated learning mechanisms with nodes having potentially different computational speeds.
We propose a non-uniform sampling scheme for the central server that allows for lower delays with better complexity.
Our experiments clearly show a significant improvement of our method over current state-of-the-art asynchronous algorithms on an image classification problem.
arXiv Detail & Related papers (2024-02-12T18:32:35Z) - Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity [85.92481138826949]
We develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods.
We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.
arXiv Detail & Related papers (2024-02-07T12:15:56Z) - Asynchronous SGD on Graphs: a Unified Framework for Asynchronous
Decentralized and Federated Optimization [13.119144971868632]
We introduce Asynchronous SGD on Graphs (AGRAF SGD) -- a general algorithmic framework that covers asynchronous versions of many popular algorithms.
We provide rates of convergence under much milder assumptions than previous decentralized asynchronous computation works.
arXiv Detail & Related papers (2023-11-01T11:58:16Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates [4.3707341422218215]
Two widely considered decentralized learning algorithms are Gossip and random walk-based learning.
We design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST.
We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20.
arXiv Detail & Related papers (2023-07-14T22:58:20Z) - $\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in
Decentralized Deep Learning [0.0]
We introduce a principled asynchronous, randomized, gossip-based optimization algorithm which works thanks to a continuous local momentum named $textbfA2textbfCiD2$.
Our theoretical analysis proves accelerated rates compared to previous asynchronous decentralized baselines.
We show consistent improvement on the ImageNet dataset using up to 64 asynchronous workers.
arXiv Detail & Related papers (2023-06-14T06:52:07Z) - Decentralized Optimization with Heterogeneous Delays: a Continuous-Time
Approach [6.187780920448871]
We propose a novel continuous-time framework to analyze asynchronous algorithms.
We describe a fully asynchronous decentralized algorithm to minimize the sum of smooth and strongly convex functions.
arXiv Detail & Related papers (2021-06-07T13:09:25Z) - Phase Retrieval using Expectation Consistent Signal Recovery Algorithm
based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems.
Recent advances in deep learning have opened up a new possibility for robust and fast PR.
We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.