ShadowSync: Performing Synchronization in the Background for Highly
Scalable Distributed Training
- URL: http://arxiv.org/abs/2003.03477v3
- Date: Tue, 23 Feb 2021 18:23:31 GMT
- Title: ShadowSync: Performing Synchronization in the Background for Highly
Scalable Distributed Training
- Authors: Qinqing Zheng, Bor-Yiing Su, Jiyan Yang, Alisson Azzolini, Qiang Wu,
Ou Jin, Shri Karandikar, Hagay Lupesko, Liang Xiong, Eric Zhou
- Abstract summary: We present shadowsync, a distributed framework specifically tailored to modern scale recommendation system training.
In contrast to previous works where synchronization happens as part of the training process, shadowsync separates the synchronization from training and runs it in the background.
- Score: 10.73956838502053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recommendation systems are often trained with a tremendous amount of data,
and distributed training is the workhorse to shorten the training time. While
the training throughput can be increased by simply adding more workers, it is
also increasingly challenging to preserve the model quality. In this paper, we
present \shadowsync, a distributed framework specifically tailored to modern
scale recommendation system training. In contrast to previous works where
synchronization happens as part of the training process, \shadowsync separates
the synchronization from training and runs it in the background. Such isolation
significantly reduces the synchronization overhead and increases the
synchronization frequency, so that we are able to obtain both high throughput
and excellent model quality when training at scale. The superiority of our
procedure is confirmed by experiments on training deep neural networks for
click-through-rate prediction tasks. Our framework is capable to express data
parallelism and/or model parallelism, generic to host various types of
synchronization algorithms, and readily applicable to large scale problems in
other areas.
Related papers
- Synchformer: Efficient Synchronization from Sparse Cues [100.89656994681934]
Our contributions include a novel audio-visual synchronization model, and training that decouples extraction from synchronization modelling.
This approach achieves state-of-the-art performance in both dense and sparse settings.
We also extend synchronization model training to AudioSet a million-scale 'in-the-wild' dataset, investigate evidence attribution techniques for interpretability, and explore a new capability for synchronization models: audio-visual synchronizability.
arXiv Detail & Related papers (2024-01-29T18:59:55Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Accelerating Distributed ML Training via Selective Synchronization [0.0]
textttSelSync is a practical, low-overhead method for DNN training that dynamically chooses to incur or avoid communication at each step.
Our system converges to the same or better accuracy than BSP while reducing training time by up to 14$times$.
arXiv Detail & Related papers (2023-07-16T05:28:59Z) - Simplifying Distributed Neural Network Training on Massive Graphs:
Randomized Partitions Improve Model Aggregation [23.018715954992352]
We present a simplified framework for distributed GNN training that does not rely on the aforementioned costly operations.
Specifically, our framework assembles independent trainers, each of which asynchronously learns a local model on locally-available parts of the training graph.
In experiments on social and e-commerce networks with up to 1.3 billion edges, our proposed RandomTMA and SuperTMA approaches achieve state-of-the-art performance and 2.31x speedup compared to the fastest baseline.
arXiv Detail & Related papers (2023-05-17T01:49:44Z) - Does compressing activations help model parallel training? [64.59298055364336]
We present the first empirical study on the effectiveness of compression methods for model parallelism.
We implement and evaluate three common classes of compression algorithms.
We evaluate these methods across more than 160 settings and 8 popular datasets.
arXiv Detail & Related papers (2023-01-06T18:58:09Z) - Efficient and Light-Weight Federated Learning via Asynchronous
Distributed Dropout [22.584080337157168]
Asynchronous learning protocols have regained attention lately, especially in the Federated Learning (FL) setup.
We propose textttAsyncDrop, a novel asynchronous FL framework that utilizes dropout regularization to handle device heterogeneity in distributed settings.
Overall, textttAsyncDrop achieves better performance compared to state of the art asynchronous methodologies.
arXiv Detail & Related papers (2022-10-28T13:00:29Z) - How Well Self-Supervised Pre-Training Performs with Streaming Data? [73.5362286533602]
In real-world scenarios where data are collected in a streaming fashion, the joint training scheme is usually storage-heavy and time-consuming.
It is unclear how well sequential self-supervised pre-training performs with streaming data.
We find sequential self-supervised learning exhibits almost the same performance as the joint training when the distribution shifts within streaming data are mild.
arXiv Detail & Related papers (2021-04-25T06:56:48Z) - Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep
Learning [10.196574441542646]
Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters.
A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization protocol.
In this paper, we design a hybrid synchronization approach that exploits the benefits of both BSP and ASP.
arXiv Detail & Related papers (2021-04-16T20:49:28Z) - Synergetic Learning of Heterogeneous Temporal Sequences for
Multi-Horizon Probabilistic Forecasting [48.8617204809538]
We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model.
To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks.
Our model can be trained effectively using variational inference and generates predictions with Monte-Carlo simulation.
arXiv Detail & Related papers (2021-01-31T11:00:55Z) - Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events"
We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output.
We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.