Accelerating Distributed ML Training via Selective Synchronization
- URL: http://arxiv.org/abs/2307.07950v2
- Date: Mon, 29 Jan 2024 18:18:56 GMT
- Title: Accelerating Distributed ML Training via Selective Synchronization
- Authors: Sahil Tyagi, Martin Swany
- Abstract summary: textttSelSync is a practical, low-overhead method for DNN training that dynamically chooses to incur or avoid communication at each step.
Our system converges to the same or better accuracy than BSP while reducing training time by up to 14$times$.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In distributed training, deep neural networks (DNNs) are launched over
multiple workers concurrently and aggregate their local updates on each step in
bulk-synchronous parallel (BSP) training. However, BSP does not linearly
scale-out due to high communication cost of aggregation. To mitigate this
overhead, alternatives like Federated Averaging (FedAvg) and Stale-Synchronous
Parallel (SSP) either reduce synchronization frequency or eliminate it
altogether, usually at the cost of lower final accuracy. In this paper, we
present \texttt{SelSync}, a practical, low-overhead method for DNN training
that dynamically chooses to incur or avoid communication at each step either by
calling the aggregation op or applying local updates based on their
significance. We propose various optimizations as part of \texttt{SelSync} to
improve convergence in the context of \textit{semi-synchronous} training. Our
system converges to the same or better accuracy than BSP while reducing
training time by up to 14$\times$.
Related papers
- Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - A Quadratic Synchronization Rule for Distributed Deep Learning [66.68264684667562]
This work proposes a theory-grounded method for determining $H$, named the Quadratic Synchronization Rule (QSR)
Experiments on ResNet and ViT show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies.
arXiv Detail & Related papers (2023-10-22T21:38:57Z) - $\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in
Decentralized Deep Learning [0.0]
We introduce a principled asynchronous, randomized, gossip-based optimization algorithm which works thanks to a continuous local momentum named $textbfA2textbfCiD2$.
Our theoretical analysis proves accelerated rates compared to previous asynchronous decentralized baselines.
We show consistent improvement on the ImageNet dataset using up to 64 asynchronous workers.
arXiv Detail & Related papers (2023-06-14T06:52:07Z) - FedSpeed: Larger Local Interval, Less Communication Round, and Higher
Generalization Accuracy [84.45004766136663]
Federated learning is an emerging distributed machine learning framework.
It suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting.
We propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems.
arXiv Detail & Related papers (2023-02-21T03:55:29Z) - TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation [53.84175614198885]
In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server.
We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation.
arXiv Detail & Related papers (2023-02-20T08:37:44Z) - Semi-Synchronous Personalized Federated Learning over Mobile Edge
Networks [88.50555581186799]
We propose a semi-synchronous PFL algorithm, termed as Semi-Synchronous Personalized FederatedAveraging (PerFedS$2$), over mobile edge networks.
We derive an upper bound of the convergence rate of PerFedS2 in terms of the number of participants per global round and the number of rounds.
Experimental results verify the effectiveness of PerFedS2 in saving training time as well as guaranteeing the convergence of training loss.
arXiv Detail & Related papers (2022-09-27T02:12:43Z) - Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep
Learning [10.196574441542646]
Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters.
A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization protocol.
In this paper, we design a hybrid synchronization approach that exploits the benefits of both BSP and ASP.
arXiv Detail & Related papers (2021-04-16T20:49:28Z) - Accelerating Neural Network Training with Distributed Asynchronous and
Selective Optimization (DASO) [0.0]
We introduce the Distributed Asynchronous and Selective Optimization (DASO) method to accelerate network training.
DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks.
We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks.
arXiv Detail & Related papers (2021-04-12T16:02:20Z) - High-Throughput Synchronous Deep RL [132.43861715707905]
We propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL)
We perform learning and rollouts concurrently, devise a system design which avoids stale policies'
We evaluate our approach on Atari games and the Google Research Football environment.
arXiv Detail & Related papers (2020-12-17T18:59:01Z) - PSO-PS: Parameter Synchronization with Particle Swarm Optimization for
Distributed Training of Deep Neural Networks [16.35607080388805]
We propose a new algorithm of integrating Particle Swarm Optimization into the distributed training process of Deep Neural Networks (DNNs)
In the proposed algorithm, a computing work is encoded by a particle, the weights of DNNs and the training loss are modeled by the particle attributes.
At each synchronization stage, the weights are updated by PSO from the sub weights gathered from all workers, instead of averaging the weights or the gradients.
arXiv Detail & Related papers (2020-09-06T05:18:32Z) - DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle
Synchronization for Distributed DNN Training [15.246142393381488]
We present a novel divide-and-shuffle synchronization (DS-Sync) to realize communication efficiency without sacrificing convergence accuracy for distributed DNN training.
We show that DS-Sync can achieve up to $94%$ improvements on the end-to-end training time with existing solutions while maintaining the same accuracy.
arXiv Detail & Related papers (2020-07-07T09:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.