Dynamic DNN Decomposition for Lossless Synergistic Inference
- URL: http://arxiv.org/abs/2101.05952v1
- Date: Fri, 15 Jan 2021 03:18:53 GMT
- Title: Dynamic DNN Decomposition for Lossless Synergistic Inference
- Authors: Beibei Zhang, Tian Xiang, Hongxuan Zhang, Te Li, Shiqiang Zhu, Jianjun
Gu
- Abstract summary: Deep neural networks (DNNs) sustain high performance in today's data processing applications.
We propose D3, a dynamic DNN decomposition system for synergistic inference without precision loss.
D3 outperforms the state-of-the-art counterparts up to 3.4 times in end-to-end DNN inference time and reduces backbone network communication overhead up to 3.68 times.
- Score: 0.9549013615433989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) sustain high performance in today's data
processing applications. DNN inference is resource-intensive thus is difficult
to fit into a mobile device. An alternative is to offload the DNN inference to
a cloud server. However, such an approach requires heavy raw data transmission
between the mobile device and the cloud server, which is not suitable for
mission-critical and privacy-sensitive applications such as autopilot. To solve
this problem, recent advances unleash DNN services using the edge computing
paradigm. The existing approaches split a DNN into two parts and deploy the two
partitions to computation nodes at two edge computing tiers. Nonetheless, these
methods overlook collaborative device-edge-cloud computation resources.
Besides, previous algorithms demand the whole DNN re-partitioning to adapt to
computation resource changes and network dynamics. Moreover, for
resource-demanding convolutional layers, prior works do not give a parallel
processing strategy without loss of accuracy at the edge side. To tackle these
issues, we propose D3, a dynamic DNN decomposition system for synergistic
inference without precision loss. The proposed system introduces a heuristic
algorithm named horizontal partition algorithm to split a DNN into three parts.
The algorithm can partially adjust the partitions at run time according to
processing time and network conditions. At the edge side, a vertical separation
module separates feature maps into tiles that can be independently run on
different edge nodes in parallel. Extensive quantitative evaluation of five
popular DNNs illustrates that D3 outperforms the state-of-the-art counterparts
up to 3.4 times in end-to-end DNN inference time and reduces backbone network
communication overhead up to 3.68 times.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative
Inference [12.095934624748686]
We propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework.
It automatically co-optimizes the CPU, GPU and memory frequencies of edge devices, and the feature maps to be offloaded to cloud servers.
It significantly reduces the energy consumption by 33% on average, compared to state-of-the-art schemes.
arXiv Detail & Related papers (2023-06-02T07:00:42Z) - A Survey on Deep Neural Network Partition over Cloud, Edge and End
Devices [6.248548718574856]
Deep neural network (DNN) partition is a research problem that involves splitting a DNN into multiple parts and offloading them to specific locations.
This paper provides a comprehensive survey on the recent advances and challenges in DNN partition approaches over the cloud, edge, and end devices.
arXiv Detail & Related papers (2023-04-20T00:17:27Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - A Case For Adaptive Deep Neural Networks in Edge Computing [1.683310745678261]
This paper investigates whether there is a case for adaptive Deep Neural Networks (DNNs) in edge computing.
The results show that network conditions affects DNN performance more than CPU or memory related operational conditions.
arXiv Detail & Related papers (2020-08-04T20:23:50Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.