Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge
Intelligence via Online Learning
- URL: http://arxiv.org/abs/2102.02638v1
- Date: Tue, 2 Feb 2021 18:50:06 GMT
- Title: Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge
Intelligence via Online Learning
- Authors: Letian Zhang, Lixing Chen, Jie Xu
- Abstract summary: This paper builds a collaborative deep inference system between a resource-constrained mobile device and a powerful edge server.
Our system has a built-in online learning module, called Autodidactic Neurosurgeon (ANS), to automatically learn the optimal partition point.
ANS significantly outperforms state-of-the-art benchmarks in terms of tracking system changes and reducing the end-to-end inference delay.
- Score: 19.013102763434794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent breakthroughs in deep learning (DL) have led to the emergence of many
intelligent mobile applications and services, but in the meanwhile also pose
unprecedented computing challenges on resource-constrained mobile devices. This
paper builds a collaborative deep inference system between a
resource-constrained mobile device and a powerful edge server, aiming at
joining the power of both on-device processing and computation offloading. The
basic idea of this system is to partition a deep neural network (DNN) into a
front-end part running on the mobile device and a back-end part running on the
edge server, with the key challenge being how to locate the optimal partition
point to minimize the end-to-end inference delay. Unlike existing efforts on
DNN partitioning that rely heavily on a dedicated offline profiling stage to
search for the optimal partition point, our system has a built-in online
learning module, called Autodidactic Neurosurgeon (ANS), to automatically learn
the optimal partition point on-the-fly. Therefore, ANS is able to closely
follow the changes of the system environment by generating new knowledge for
adaptive decision making. The core of ANS is a novel contextual bandit learning
algorithm, called $\mu$LinUCB, which not only has provable theoretical learning
performance guarantee but also is ultra-lightweight for easy real-world
implementation. We implement our system on a video stream object detection
testbed to validate the design of ANS and evaluate its performance. The
experiments show that ANS significantly outperforms state-of-the-art benchmarks
in terms of tracking system changes and reducing the end-to-end inference
delay.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Learning for Semantic Knowledge Base-Guided Online Feature Transmission
in Dynamic Channels [41.59960455142914]
We propose an online optimization framework to address the challenge of dynamic channel conditions and device mobility in an end-to-end communication system.
Our approach builds upon existing methods by leveraging a semantic knowledge base to drive multi-level feature transmission.
To solve the online optimization problem, we design a novel soft actor-critic-based deep reinforcement learning system with a carefully designed reward function for real-time decision-making.
arXiv Detail & Related papers (2023-11-30T07:35:56Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Split-Et-Impera: A Framework for the Design of Distributed Deep Learning
Applications [8.434224141580758]
Split-Et-Impera determines the set of the best-split points of a neural network based on deep network interpretability principles.
It performs a communication-aware simulation for the rapid evaluation of different neural network rearrangements.
It suggests the best match between the quality of service requirements of the application and the performance in terms of accuracy and latency time.
arXiv Detail & Related papers (2023-03-22T13:00:00Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - Online Learning for Orchestration of Inference in Multi-User
End-Edge-Cloud Networks [3.6076391721440633]
Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency.
We propose a reinforcement-learning-based computation offloading solution that learns optimal offloading policy.
Our solution provides 35% speedup in the average response time compared to the state-of-the-art with less than 0.9% accuracy reduction.
arXiv Detail & Related papers (2022-02-21T21:41:29Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z) - AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference
under Stochastic Variance [11.093360539563657]
AutoScale is an adaptive and light-weight execution scaling engine built upon the custom-designed reinforcement learning algorithm.
This paper proposes AutoScale to enable accurate, energy-efficient deep learning inference at the edge.
arXiv Detail & Related papers (2020-05-06T00:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.