Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand
Edge Resource
- URL: http://arxiv.org/abs/2306.12185v1
- Date: Wed, 21 Jun 2023 11:32:28 GMT
- Title: Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand
Edge Resource
- Authors: Xiang Yang, Dezhi Chen, Qi Qi, Jingyu Wang, Haifeng Sun, Jianxin Liao,
Song Guo
- Abstract summary: Deep Neural Networks (DNNs) have significantly improved the accuracy of intelligent applications on mobile devices.
DNN surgery can enable real-time inference despite the computational limitations of mobile devices.
This paper introduces a novel Decentralized DNN Surgery (DDS) framework.
- Score: 25.274288063300844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Neural Networks (DNNs) have significantly improved the accuracy of
intelligent applications on mobile devices. DNN surgery, which partitions DNN
processing between mobile devices and multi-access edge computing (MEC)
servers, can enable real-time inference despite the computational limitations
of mobile devices. However, DNN surgery faces a critical challenge: determining
the optimal computing resource demand from the server and the corresponding
partition strategy, while considering both inference latency and MEC server
usage costs. This problem is compounded by two factors: (1) the finite
computing capacity of the MEC server, which is shared among multiple devices,
leading to inter-dependent demands, and (2) the shift in modern DNN
architecture from chains to directed acyclic graphs (DAGs), which complicates
potential solutions.
In this paper, we introduce a novel Decentralized DNN Surgery (DDS)
framework. We formulate the partition strategy as a min-cut and propose a
resource allocation game to adaptively schedule the demands of mobile devices
in an MEC environment. We prove the existence of a Nash Equilibrium (NE), and
develop an iterative algorithm to efficiently reach the NE for each device. Our
extensive experiments demonstrate that DDS can effectively handle varying MEC
scenarios, achieving up to 1.25$\times$ acceleration compared to the
state-of-the-art algorithm.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Survey on Computer Vision Techniques for Internet-of-Things Devices [0.0]
Deep neural networks (DNNs) are state-of-the-art techniques for solving computer vision problems.
DNNs require billions of parameters and operations to achieve state-of-the-art results.
This requirement makes DNNs extremely compute, memory, and energy-hungry, and consequently difficult to deploy on small battery-powered Internet-of-Things (IoT) devices with limited computing resources.
arXiv Detail & Related papers (2023-08-02T03:41:24Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End
Inference of Real-World Deep Neural Networks [12.361842554233558]
Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency.
Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference.
We present a heterogeneous tightly-coupled architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators.
arXiv Detail & Related papers (2022-01-04T11:12:01Z) - Latency-Memory Optimized Splitting of Convolution Neural Networks for
Resource Constrained Edge Devices [1.6873748786804317]
We argue that running CNNs between an edge device and the cloud is synonymous to solving a resource-constrained optimization problem.
Experiments done on real-world edge devices show that, LMOS ensures feasible execution of different CNN models at the edge.
arXiv Detail & Related papers (2021-07-19T19:39:56Z) - Deep Learning-based Resource Allocation For Device-to-Device
Communication [66.74874646973593]
We propose a framework for the optimization of the resource allocation in multi-channel cellular systems with device-to-device (D2D) communication.
A deep learning (DL) framework is proposed, where the optimal resource allocation strategy for arbitrary channel conditions is approximated by deep neural network (DNN) models.
Our simulation results confirm that near-optimal performance can be attained with low time, which underlines the real-time capability of the proposed scheme.
arXiv Detail & Related papers (2020-11-25T14:19:23Z) - Computation Offloading in Multi-Access Edge Computing Networks: A
Multi-Task Learning Approach [7.203439085947118]
Multi-access edge computing (MEC) has already shown the potential in enabling mobile devices to bear the computation-intensive applications by offloading some tasks to a nearby access point (AP) integrated with a MEC server (MES)
However, due to the varying network conditions and limited computation resources of the MES, the offloading decisions taken by a mobile device and the computational resources allocated by the MES may not be efficiently achieved with the lowest cost.
We propose a dynamic offloading framework for the MEC network, in which the uplink non-orthogonal multiple access (NOMA) is used to enable multiple devices to upload their
arXiv Detail & Related papers (2020-06-29T15:11:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.