Analysis of Distributed Deep Learning in the Cloud
- URL: http://arxiv.org/abs/2208.14344v1
- Date: Tue, 30 Aug 2022 15:42:36 GMT
- Title: Analysis of Distributed Deep Learning in the Cloud
- Authors: Aakash Sharma, Vivek M. Bhasi, Sonali Singh, Rishabh Jain, Jashwant
Raj Gunasekaran, Subrata Mitra, Mahmut Taylan Kandemir, George Kesidis, Chita
R. Das
- Abstract summary: We introduce a comprehensive distributed deep learning (DDL) profiler, which can determine the various execution "stalls" that DDL suffers from while running on a public cloud.
We estimate two types of communication stalls - interconnect and network stalls.
We train popular DNN models using the profiler to characterize various AWS GPU instances and list their advantages and shortcomings for users to make an informed decision.
- Score: 17.91202259637393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We aim to resolve this problem by introducing a comprehensive distributed
deep learning (DDL) profiler, which can determine the various execution
"stalls" that DDL suffers from while running on a public cloud. We have
implemented the profiler by extending prior work to additionally estimate two
types of communication stalls - interconnect and network stalls. We train
popular DNN models using the profiler to characterize various AWS GPU instances
and list their advantages and shortcomings for users to make an informed
decision. We observe that the more expensive GPU instances may not be the most
performant for all DNN models and AWS may sub-optimally allocate hardware
interconnect resources. Specifically, the intra-machine interconnect can
introduce communication overheads up to 90% of DNN training time and
network-connected instances can suffer from up to 5x slowdown compared to
training on a single instance. Further, we model the impact of DNN macroscopic
features such as the number of layers and the number of gradients on
communication stalls. Finally, we propose a measurement-based recommendation
model for users to lower their public cloud monetary costs for DDL, given a
time budget.
Related papers
- MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via
Automating Deep Neural Network Porting for Mobile Deployment [54.77943671991863]
MatchNAS is a novel scheme for porting Deep Neural Networks to mobile devices.
We optimise a large network family using both labelled and unlabelled data.
We then automatically search for tailored networks for different hardware platforms.
arXiv Detail & Related papers (2024-02-21T04:43:12Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Enabling Deep Learning on Edge Devices [2.741266294612776]
Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc.
The high-performed DNNs heavily rely on intensive resource consumption.
Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices.
In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems
arXiv Detail & Related papers (2022-10-06T20:52:57Z) - Decentralized Low-Latency Collaborative Inference via Ensembles on the
Edge [28.61344039233783]
We propose to facilitate the application of deep neural networks (DNNs) on the edge by allowing multiple users to collaborate during inference to improve their accuracy.
Our mechanism, coined em edge ensembles, is based on having diverse predictors at each device, which form an ensemble of models during inference.
We analyze the latency induced by edge ensembles, showing that its performance improvement comes at the cost of a minor additional delay under common assumptions on the communication network.
arXiv Detail & Related papers (2022-06-07T10:24:20Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - Characterizing and Modeling Distributed Training with Transient Cloud
GPU Servers [6.56704851092678]
We analyze distributed training performance under diverse cluster configurations using CM-DARE.
Our empirical datasets include measurements from three GPU types, six geographic regions, twenty convolutional neural networks, and thousands of Google Cloud servers.
We also demonstrate the feasibility of predicting training speed and overhead using regression-based models.
arXiv Detail & Related papers (2020-04-07T01:49:58Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.