Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing
- URL: http://arxiv.org/abs/2211.13778v2
- Date: Mon, 28 Nov 2022 15:55:29 GMT
- Title: Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing
- Authors: Zhongtian Dong, Nan Li, Alexandros Iosifidis, Qi Zhang
- Abstract summary: HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
- Score: 85.74517957717363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For time-critical IoT applications using deep learning, inference
acceleration through distributed computing is a promising approach to meet a
stringent deadline. In this paper, we implement a working prototype of a new
distributed inference acceleration method HALP using three raspberry Pi 4. HALP
accelerates inference by designing a seamless collaboration among edge devices
(EDs) in Edge Computing. We maximize the parallelization between communication
and computation among the collaborative EDs by optimizing the task partitioning
ratio based on the segment-based partitioning. Experimental results show that
the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
Then, we combine distributed inference with conventional neural network model
compression by setting up different shrinking hyperparameters for MobileNet-V1.
In this way, we can further accelerate inference but at the cost of inference
accuracy loss. To strike a balance between latency and accuracy, we propose
dynamic model selection to select a model which provides the highest accuracy
within the latency constraint. It is shown that the model selection with
distributed inference HALP can significantly improve service reliability
compared to the conventional stand-alone computation.
Related papers
- AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel
Training [22.107070114339038]
We propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training.
In particular, we propose a light-weight adaptive latency predictor to accurately estimate the latency of each layer at different devices.
Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster.
arXiv Detail & Related papers (2023-11-10T02:18:33Z) - Gradient Sparsification for Efficient Wireless Federated Learning with
Differential Privacy [25.763777765222358]
Federated learning (FL) enables distributed clients to collaboratively train a machine learning model without sharing raw data with each other.
As the model size grows, the training latency due to limited transmission bandwidth and private information degrades while using differential privacy (DP) protection.
We propose sparsification empowered FL framework wireless channels, in over to improve training efficiency without sacrificing convergence performance.
arXiv Detail & Related papers (2023-04-09T05:21:15Z) - Predictive GAN-powered Multi-Objective Optimization for Hybrid Federated
Split Learning [56.125720497163684]
We propose a hybrid federated split learning framework in wireless networks.
We design a parallel computing scheme for model splitting without label sharing, and theoretically analyze the influence of the delayed gradient caused by the scheme on the convergence speed.
arXiv Detail & Related papers (2022-09-02T10:29:56Z) - Distributed Deep Learning Inference Acceleration using Seamless
Collaboration in Edge Computing [93.67044879636093]
This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing.
We design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP.
Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier.
arXiv Detail & Related papers (2022-07-22T18:39:09Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.