Robust DNN Partitioning and Resource Allocation Under Uncertain Inference Time
- URL: http://arxiv.org/abs/2503.21476v1
- Date: Thu, 27 Mar 2025 13:06:26 GMT
- Title: Robust DNN Partitioning and Resource Allocation Under Uncertain Inference Time
- Authors: Zhaojun Nan, Yunchu Han, Sheng Zhou, Zhisheng Niu,
- Abstract summary: In edge intelligence systems, deep neural network (DNN) partitioning and data offloading can provide real-time task inference for resource-constrained mobile devices.<n>We propose a robust optimization scheme to minimize the total energy consumption of mobile devices while meeting task probabilistic deadlines.
- Score: 12.359690205873335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In edge intelligence systems, deep neural network (DNN) partitioning and data offloading can provide real-time task inference for resource-constrained mobile devices. However, the inference time of DNNs is typically uncertain and cannot be precisely determined in advance, presenting significant challenges in ensuring timely task processing within deadlines. To address the uncertain inference time, we propose a robust optimization scheme to minimize the total energy consumption of mobile devices while meeting task probabilistic deadlines. The scheme only requires the mean and variance information of the inference time, without any prediction methods or distribution functions. The problem is formulated as a mixed-integer nonlinear programming (MINLP) that involves jointly optimizing the DNN model partitioning and the allocation of local CPU/GPU frequencies and uplink bandwidth. To tackle the problem, we first decompose the original problem into two subproblems: resource allocation and DNN model partitioning. Subsequently, the two subproblems with probability constraints are equivalently transformed into deterministic optimization problems using the chance-constrained programming (CCP) method. Finally, the convex optimization technique and the penalty convex-concave procedure (PCCP) technique are employed to obtain the optimal solution of the resource allocation subproblem and a stationary point of the DNN model partitioning subproblem, respectively. The proposed algorithm leverages real-world data from popular hardware platforms and is evaluated on widely used DNN models. Extensive simulations show that our proposed algorithm effectively addresses the inference time uncertainty with probabilistic deadline guarantees while minimizing the energy consumption of mobile devices.
Related papers
- Privacy-Aware Joint DNN Model Deployment and Partition Optimization for Delay-Efficient Collaborative Edge Inference [14.408050197587654]
Edge inference (EI) is a key solution to address the growing challenges of delayed response times, limited scalability, and privacy concerns in cloud-based Deep Neural Network (DNN) inference.<n>This paper proposes a novel framework for privacy-aware joint DNN model deployment and partition optimization to minimize long-term average inference delay under resource and privacy constraints.
arXiv Detail & Related papers (2025-02-22T05:27:24Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality [55.06411438416805]
Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications.<n>This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS- DDR) to efficiently train parametric actor policies.<n>It is shown to enhance solution quality and to reduce computation times by several orders of magnitude when compared to current state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Power Control with QoS Guarantees: A Differentiable Projection-based
Unsupervised Learning Framework [14.518558523319518]
Deep neural networks (DNNs) are emerging as a potential solution to solve NP-hard wireless resource allocation problems.
We propose a novel unsupervised learning framework to solve the classical power control problem in a multi-user channel.
We show that the proposed solutions not only improve the data rate but also achieve zero constraint violation probability, compared to the existing computations.
arXiv Detail & Related papers (2023-05-31T14:11:51Z) - Learning from Images: Proactive Caching with Parallel Convolutional
Neural Networks [94.85780721466816]
A novel framework for proactive caching is proposed in this paper.
It combines model-based optimization with data-driven techniques by transforming an optimization problem into a grayscale image.
Numerical results show that the proposed scheme can reduce 71.6% computation time with only 0.8% additional performance cost.
arXiv Detail & Related papers (2021-08-15T21:32:47Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Joint Multi-User DNN Partitioning and Computational Resource Allocation
for Collaborative Edge Intelligence [21.55340197267767]
Mobile Edge Computing (MEC) has emerged as a promising supporting architecture providing a variety of resources to the network edge.
With the assistance of edge servers, user equipments (UEs) are able to run deep neural network (DNN) based AI applications.
We propose an algorithm called Iterative Alternating Optimization (IAO) that can achieve the optimal solution in time.
arXiv Detail & Related papers (2020-07-15T09:40:13Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.