Related papers: Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning

Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning

URL: http://arxiv.org/abs/2104.08364v2
Date: Tue, 20 Apr 2021 00:25:37 GMT
Title: Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning
Authors: Shijian Li, Oren Mangoubi, Lijie Xu, Tian Guo
Abstract summary: Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization protocol. In this paper, we design a hybrid synchronization approach that exploits the benefits of both BSP and ASP.
Score: 10.196574441542646
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Stochastic Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization protocol. For example, while Bulk Synchronous Parallel (BSP) often achieves better converged accuracy, the corresponding training throughput can be negatively impacted by stragglers. In contrast, Asynchronous Parallel (ASP) can have higher throughput, but its convergence and accuracy can be impacted by stale gradients. To improve the performance of synchronization protocol, recent work often focuses on designing new protocols with a heavy reliance on hard-to-tune hyper-parameters. In this paper, we design a hybrid synchronization approach that exploits the benefits of both BSP and ASP, i.e., reducing training time while simultaneously maintaining the converged accuracy. Based on extensive empirical profiling, we devise a collection of adaptive policies that determine how and when to switch between synchronization protocols. Our policies include both offline ones that target recurring jobs and online ones for handling transient stragglers. We implement the proposed policies in a prototype system, called Sync-Switch, on top of TensorFlow, and evaluate the training performance with popular deep learning models and datasets. Our experiments show that Sync-Switch achieves up to 5.13X throughput speedup and similar converged accuracy when comparing to BSP. Further, we observe that Sync-Switch achieves 3.8% higher converged accuracy with just 1.23X the training time compared to training with ASP. Moreover, Sync-Switch can be used in settings when training with ASP leads to divergence errors. Sync-Switch achieves all of these benefits with very low overhead, e.g., the framework overhead can be as low as 1.7% of the total training time.

Related papers

Efficient Asynchronous Federated Learning with Sparsification and Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data. FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training. We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z)
A Quadratic Synchronization Rule for Distributed Deep Learning [66.68264684667562]
This work proposes a theory-grounded method for determining $H$, named the Quadratic Synchronization Rule (QSR) Experiments on ResNet and ViT show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies.
arXiv Detail & Related papers (2023-10-22T21:38:57Z)
Accelerating Distributed ML Training via Selective Synchronization [0.0]
textttSelSync is a practical, low-overhead method for DNN training that dynamically chooses to incur or avoid communication at each step. Our system converges to the same or better accuracy than BSP while reducing training time by up to 14$times$.
arXiv Detail & Related papers (2023-07-16T05:28:59Z)
HFedMS: Heterogeneous Federated Learning with Memorable Data Semantics in Industrial Metaverse [49.1501082763252]
This paper presents HFEDMS for incorporating practical FL into the emerging Industrial Metaverse. It reduces data heterogeneity through dynamic grouping and training mode conversion. Then, it compensates for the forgotten knowledge by fusing compressed historical data semantics. Experiments have been conducted on the streamed non-i.i.d. FEMNIST dataset using 368 simulated devices.
arXiv Detail & Related papers (2022-11-07T04:33:24Z)
Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout [22.584080337157168]
Asynchronous learning protocols have regained attention lately, especially in the Federated Learning (FL) setup. We propose textttAsyncDrop, a novel asynchronous FL framework that utilizes dropout regularization to handle device heterogeneity in distributed settings. Overall, textttAsyncDrop achieves better performance compared to state of the art asynchronous methodologies.
arXiv Detail & Related papers (2022-10-28T13:00:29Z)
Semi-Synchronous Personalized Federated Learning over Mobile Edge Networks [88.50555581186799]
We propose a semi-synchronous PFL algorithm, termed as Semi-Synchronous Personalized FederatedAveraging (PerFedS$2$), over mobile edge networks. We derive an upper bound of the convergence rate of PerFedS2 in terms of the number of participants per global round and the number of rounds. Experimental results verify the effectiveness of PerFedS2 in saving training time as well as guaranteeing the convergence of training loss.
arXiv Detail & Related papers (2022-09-27T02:12:43Z)
GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Model [19.65557684234458]
We propose Global Batch gradients Aggregation (GBA) over parameter server (PS) A token-control process is implemented to assemble the gradients and decay the gradients with severe staleness. Experiments on three industrial-scale recommendation tasks show that GBA is an effective tuning-free approach for switching.
arXiv Detail & Related papers (2022-05-23T05:22:42Z)
High-Throughput Synchronous Deep RL [132.43861715707905]
We propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL) We perform learning and rollouts concurrently, devise a system design which avoids stale policies' We evaluate our approach on Atari games and the Google Research Football environment.
arXiv Detail & Related papers (2020-12-17T18:59:01Z)
Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism [56.78673028601739]
We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training. DCT reduces communication by at least $100times$ and $20times$ during DP and MP, respectively. It improves end-to-end training time for a state-of-the-art industrial recommender model by 37%, without any loss in performance.
arXiv Detail & Related papers (2020-10-18T01:44:42Z)
PSO-PS: Parameter Synchronization with Particle Swarm Optimization for Distributed Training of Deep Neural Networks [16.35607080388805]
We propose a new algorithm of integrating Particle Swarm Optimization into the distributed training process of Deep Neural Networks (DNNs) In the proposed algorithm, a computing work is encoded by a particle, the weights of DNNs and the training loss are modeled by the particle attributes. At each synchronization stage, the weights are updated by PSO from the sub weights gathered from all workers, instead of averaging the weights or the gradients.
arXiv Detail & Related papers (2020-09-06T05:18:32Z)
ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training [10.73956838502053]
We present shadowsync, a distributed framework specifically tailored to modern scale recommendation system training. In contrast to previous works where synchronization happens as part of the training process, shadowsync separates the synchronization from training and runs it in the background.
arXiv Detail & Related papers (2020-03-07T00:26:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.