Communication-Computation Efficient Device-Edge Co-Inference via AutoML
- URL: http://arxiv.org/abs/2108.13009v2
- Date: Tue, 31 Aug 2021 15:13:59 GMT
- Title: Communication-Computation Efficient Device-Edge Co-Inference via AutoML
- Authors: Xinjie Zhang, Jiawei Shao, Yuyi Mao, and Jun Zhang
- Abstract summary: Device-edge co-inference partitions a deep neural network between a resource-constrained mobile device and an edge server.
On-device model sparsity level and intermediate feature compression ratio have direct impacts on workload and communication overhead.
We propose a novel automated machine learning (AutoML) framework based on deep reinforcement learning (DRL)
- Score: 4.06604174802643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Device-edge co-inference, which partitions a deep neural network between a
resource-constrained mobile device and an edge server, recently emerges as a
promising paradigm to support intelligent mobile applications. To accelerate
the inference process, on-device model sparsification and intermediate feature
compression are regarded as two prominent techniques. However, as the on-device
model sparsity level and intermediate feature compression ratio have direct
impacts on computation workload and communication overhead respectively, and
both of them affect the inference accuracy, finding the optimal values of these
hyper-parameters brings a major challenge due to the large search space. In
this paper, we endeavor to develop an efficient algorithm to determine these
hyper-parameters. By selecting a suitable model split point and a pair of
encoder/decoder for the intermediate feature vector, this problem is casted as
a sequential decision problem, for which, a novel automated machine learning
(AutoML) framework is proposed based on deep reinforcement learning (DRL).
Experiment results on an image classification task demonstrate the
effectiveness of the proposed framework in achieving a better
communication-computation trade-off and significant inference speedup against
various baseline schemes.
Related papers
- Heterogeneity-Aware Resource Allocation and Topology Design for Hierarchical Federated Edge Learning [9.900317349372383]
Federated Learning (FL) provides a privacy-preserving framework for training machine learning models on mobile edge devices.
Traditional FL algorithms, e.g., FedAvg, impose a heavy communication workload on these devices.
We propose a two-tier HFEL system, where edge devices are connected to edge servers and edge servers are interconnected through peer-to-peer (P2P) edge backhauls.
Our goal is to enhance the training efficiency of the HFEL system through strategic resource allocation and topology design.
arXiv Detail & Related papers (2024-09-29T01:48:04Z) - Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge [35.40849522296486]
Large-scale foundation models (FoMos) can perform human-like intelligence.
FoMos need to be adapted to specialized downstream tasks through fine-tuning techniques.
We advocate multi-device cooperation within the device-edge cooperative fine-tuning paradigm.
arXiv Detail & Related papers (2024-07-13T12:47:14Z) - Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z) - Task-Oriented Over-the-Air Computation for Multi-Device Edge AI [57.50247872182593]
6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task.
Task-oriented over-the-air computation (AirComp) scheme is proposed in this paper for multi-device split-inference system.
arXiv Detail & Related papers (2022-11-02T16:35:14Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Communication-Computation Trade-Off in Resource-Constrained Edge
Inference [5.635540684037595]
This article presents effective methods for edge inference at resource-constrained devices.
It focuses on device-edge co-inference, assisted by an edge computing server.
A three-step framework is proposed for the effective inference.
arXiv Detail & Related papers (2020-06-03T11:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.