Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing
- URL: http://arxiv.org/abs/2207.11293v1
- Date: Fri, 22 Jul 2022 18:38:11 GMT
- Title: Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing
- Authors: Nan Li, Alexandros Iosifidis, Qi Zhang
- Abstract summary: We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
- Score: 93.67044879636093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies inference acceleration using distributed convolutional
neural networks (CNNs) in collaborative edge computing network. To avoid
inference accuracy loss in inference task partitioning, we propose receptive
field-based segmentation (RFS). To reduce the computation time and
communication overhead, we propose a novel collaborative edge computing using
fused-layer parallelization to partition a CNN model into multiple blocks of
convolutional layers. In this scheme, the collaborative edge servers (ESs) only
need to exchange small fraction of the sub-outputs after computing each fused
block. In addition, to find the optimal solution of partitioning a CNN model
into multiple blocks, we use dynamic programming, named as dynamic programming
for fused-layer parallelization (DPFP). The experimental results show that DPFP
can accelerate inference of VGG-16 up to 73% compared with the pre-trained
model, which outperforms the existing work MoDNN in all tested scenarios.
Moreover, we evaluate the service reliability of DPFP under time-variant
channel, which shows that DPFP is an effective solution to ensure high service
reliability with strict service deadline.
Related papers
- Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - Distributed Deep Learning Inference Acceleration using Seamless
Collaboration in Edge Computing [93.67044879636093]
This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing.
We design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP.
Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier.
arXiv Detail & Related papers (2022-07-22T18:39:09Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Partitioning sparse deep neural networks for scalable training and
inference [8.282177703075453]
State-of-the-art deep neural networks (DNNs) have significant computational and data management requirements.
Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs.
The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning.
arXiv Detail & Related papers (2021-04-23T20:05:52Z) - Training and Inference for Integer-Based Semantic Segmentation Network [18.457074855823315]
We propose a new quantization framework for training and inference of semantic segmentation networks.
Our framework is evaluated on mainstream semantic segmentation networks like FCN-VGG16 and DeepLabv3-ResNet50.
arXiv Detail & Related papers (2020-11-30T02:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.