Optimal Model Placement and Online Model Splitting for Device-Edge
Co-Inference
- URL: http://arxiv.org/abs/2105.13618v1
- Date: Fri, 28 May 2021 06:55:04 GMT
- Title: Optimal Model Placement and Online Model Splitting for Device-Edge
Co-Inference
- Authors: Jia Yan, Suzhi Bi, Ying-Jun Angela Zhang
- Abstract summary: Device-edge co-inference opens up new possibilities for resource-constrained wireless devices to execute deep neural network (DNN)-based applications.
We study the joint optimization of the model placement and online model splitting decisions to minimize the energy-and-time cost of device-edge co-inference.
- Score: 22.785214118527872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Device-edge co-inference opens up new possibilities for resource-constrained
wireless devices (WDs) to execute deep neural network (DNN)-based applications
with heavy computation workloads. In particular, the WD executes the first few
layers of the DNN and sends the intermediate features to the edge server that
processes the remaining layers of the DNN. By adapting the model splitting
decision, there exists a tradeoff between local computation cost and
communication overhead. In practice, the DNN model is re-trained and updated
periodically at the edge server. Once the DNN parameters are regenerated, part
of the updated model must be placed at the WD to facilitate on-device
inference. In this paper, we study the joint optimization of the model
placement and online model splitting decisions to minimize the energy-and-time
cost of device-edge co-inference in presence of wireless channel fading. The
problem is challenging because the model placement and model splitting
decisions are strongly coupled, while involving two different time scales. We
first tackle online model splitting by formulating an optimal stopping problem,
where the finite horizon of the problem is determined by the model placement
decision. In addition to deriving the optimal model splitting rule based on
backward induction, we further investigate a simple one-stage look-ahead rule,
for which we are able to obtain analytical expressions of the model splitting
decision. The analysis is useful for us to efficiently optimize the model
placement decision in a larger time scale. In particular, we obtain a
closed-form model placement solution for the fully-connected multilayer
perceptron with equal neurons. Simulation results validate the superior
performance of the joint optimal model placement and splitting with various DNN
structures.
Related papers
- Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation.
In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z) - Accelerating Split Federated Learning over Wireless Communication
Networks [17.97006656280742]
We consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL)
We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency.
Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.
arXiv Detail & Related papers (2023-10-24T07:49:56Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - JMSNAS: Joint Model Split and Neural Architecture Search for Learning
over Mobile Edge Networks [23.230079759174902]
Joint model split and neural architecture search (JMSNAS) framework is proposed to automatically generate and deploy a DNN model over a mobile edge network.
Considering both the computing and communication resource constraints, a computational graph search problem is formulated.
Experiment results confirm the superiority of the proposed framework over the state-of-the-art split machine learning design methods.
arXiv Detail & Related papers (2021-11-16T03:10:23Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.