Distributed Deep Learning Inference Acceleration using Seamless
Collaboration in Edge Computing
- URL: http://arxiv.org/abs/2207.11294v1
- Date: Fri, 22 Jul 2022 18:39:09 GMT
- Title: Distributed Deep Learning Inference Acceleration using Seamless
Collaboration in Edge Computing
- Authors: Nan Li, Alexandros Iosifidis, Qi Zhang
- Abstract summary: This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing.
We design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP.
Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier.
- Score: 93.67044879636093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies inference acceleration using distributed convolutional
neural networks (CNNs) in collaborative edge computing. To ensure inference
accuracy in inference task partitioning, we consider the receptive-field when
performing segment-based partitioning. To maximize the parallelization between
the communication and computing processes, thereby minimizing the total
inference time of an inference task, we design a novel task collaboration
scheme in which the overlapping zone of the sub-tasks on secondary edge servers
(ESs) is executed on the host ES, named as HALP. We further extend HALP to the
scenario of multiple tasks. Experimental results show that HALP can accelerate
CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks
per batch on GTX 1080TI and JETSON AGX Xavier, which outperforms the
state-of-the-art work MoDNN. Moreover, we evaluate the service reliability
under time-variant channel, which shows that HALP is an effective solution to
ensure high service reliability with strict service deadline.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - Multi-Agent Collaborative Inference via DNN Decoupling: Intermediate
Feature Compression and Edge Learning [31.291738577705257]
We study the multi-agent collaborative inference scenario, where a single edge server coordinates the inference of multiple UEs.
To achieve this goal, we first design a lightweight autoencoder-based method to compress the large intermediate feature.
Then we define tasks according to the inference overhead of DNNs and formulate the problem as a Markov decision process.
arXiv Detail & Related papers (2022-05-24T07:29:33Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - Boundary-assisted Region Proposal Networks for Nucleus Segmentation [89.69059532088129]
Machine learning models cannot perform well because of large amount of crowded nuclei.
We devise a Boundary-assisted Region Proposal Network (BRP-Net) that achieves robust instance-level nucleus segmentation.
arXiv Detail & Related papers (2020-06-04T08:26:38Z) - Distributed Primal-Dual Optimization for Online Multi-Task Learning [22.45069527817333]
We propose an adaptive primal-dual algorithm, which captures task-specific noise in adversarial learning and carries out a projection-free update with runtime efficiency.
Our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update.
Empirical results confirm that the proposed model is highly effective on various real-world datasets.
arXiv Detail & Related papers (2020-04-02T23:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.