Related papers: Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing

Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing

URL: http://arxiv.org/abs/2211.13778v2
Date: Mon, 28 Nov 2022 15:55:29 GMT
Title: Design and Prototyping Distributed CNN Inference Acceleration in Edge Computing
Authors: Zhongtian Dong, Nan Li, Alexandros Iosifidis, Qi Zhang
Abstract summary: HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing. Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16. It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
Score: 85.74517957717363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For time-critical IoT applications using deep learning, inference acceleration through distributed computing is a promising approach to meet a stringent deadline. In this paper, we implement a working prototype of a new distributed inference acceleration method HALP using three raspberry Pi 4. HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing. We maximize the parallelization between communication and computation among the collaborative EDs by optimizing the task partitioning ratio based on the segment-based partitioning. Experimental results show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16. Then, we combine distributed inference with conventional neural network model compression by setting up different shrinking hyperparameters for MobileNet-V1. In this way, we can further accelerate inference but at the cost of inference accuracy loss. To strike a balance between latency and accuracy, we propose dynamic model selection to select a model which provides the highest accuracy within the latency constraint. It is shown that the model selection with distributed inference HALP can significantly improve service reliability compared to the conventional stand-alone computation.

Related papers

The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z)
Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks [9.37715274700407]
This study proposes a novel cooperative inference method for real-time 3D human pose estimation in mobile edge computing networks. We numerically analyze the performance of the proposed inference method in terms of the inference accuracy and end-to-end delay.
arXiv Detail & Related papers (2025-04-03T21:58:29Z)
AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training [22.107070114339038]
We propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the latency of each layer at different devices. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster.
arXiv Detail & Related papers (2023-11-10T02:18:33Z)
Gradient Sparsification for Efficient Wireless Federated Learning with Differential Privacy [25.763777765222358]
Federated learning (FL) enables distributed clients to collaboratively train a machine learning model without sharing raw data with each other. As the model size grows, the training latency due to limited transmission bandwidth and private information degrades while using differential privacy (DP) protection. We propose sparsification empowered FL framework wireless channels, in over to improve training efficiency without sacrificing convergence performance.
arXiv Detail & Related papers (2023-04-09T05:21:15Z)
Predictive GAN-powered Multi-Objective Optimization for Hybrid Federated Split Learning [56.125720497163684]
We propose a hybrid federated split learning framework in wireless networks. We design a parallel computing scheme for model splitting without label sharing, and theoretically analyze the influence of the delayed gradient caused by the scheme on the convergence speed.
arXiv Detail & Related papers (2022-09-02T10:29:56Z)
Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing [93.67044879636093]
This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. We design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP. Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier.
arXiv Detail & Related papers (2022-07-22T18:39:09Z)
Receptive Field-based Segmentation for Distributed CNN Inference Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network. We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.