Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission
- URL: http://arxiv.org/abs/2601.08135v1
- Date: Tue, 13 Jan 2026 01:56:46 GMT
- Title: Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission
- Authors: Zengzipeng Tang, Yuxuan Sun, Wei Chen, Jianwen Ding, Bo Ai, Yulin Shao,
- Abstract summary: Device-edge collaborative inference with Deep Neural Networks (DNNs) faces fundamental trade-offs among accuracy, latency and energy consumption.<n>This paper proposes a novel ENergy-ACcuracy Hierarchical optimization framework for split Inference, named ENACHI.<n> Experiments on ImageNet dataset demonstrate that ENACHI outperforms state-of-the-art benchmarks under varying deadlines and bandwidths.
- Score: 23.81409473238433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Device-edge collaborative inference with Deep Neural Networks (DNNs) faces fundamental trade-offs among accuracy, latency and energy consumption. Current scheduling exhibits two drawbacks: a granularity mismatch between coarse, task-level decisions and fine-grained, packet-level channel dynamics, and insufficient awareness of per-task complexity. Consequently, scheduling solely at the task level leads to inefficient resource utilization. This paper proposes a novel ENergy-ACcuracy Hierarchical optimization framework for split Inference, named ENACHI, that jointly optimizes task- and packet-level scheduling to maximize accuracy under energy and delay constraints. A two-tier Lyapunov-based framework is developed for ENACHI, with a progressive transmission technique further integrated to enhance adaptivity. At the task level, an outer drift-plus-penalty loop makes online decisions for DNN partitioning and bandwidth allocation, and establishes a reference power budget to manage the long-term energy-accuracy trade-off. At the packet level, an uncertainty-aware progressive transmission mechanism is employed to adaptively manage per-sample task complexity. This is integrated with a nested inner control loop implementing a novel reference-tracking policy, which dynamically adjusts per-slot transmit power to adapt to fluctuating channel conditions. Experiments on ImageNet dataset demonstrate that ENACHI outperforms state-of-the-art benchmarks under varying deadlines and bandwidths, achieving a 43.12\% gain in inference accuracy with a 62.13\% reduction in energy consumption under stringent deadlines, and exhibits high scalability by maintaining stable energy consumption in congested multi-user scenarios.
Related papers
- Orchestrating Multimodal DNN Workloads in Wireless Neural Processing [57.510786937781866]
In edge inference, wireless resource allocation and accelerator deep neural computation (DNN) scheduling have yet to be co-optimized in an end-to-end manner.<n>This paper investigates a paradigm that integrates wireless transmission and multi-core execution into a unified end-to-end pipeline.
arXiv Detail & Related papers (2026-03-02T17:25:43Z) - TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration [64.32072516882947]
Diffusion Policy excels in embodied control but suffers from high inference latency and computational cost.<n>We propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP)<n>TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz.
arXiv Detail & Related papers (2025-12-13T07:53:14Z) - Resource Allocation in Hybrid Radio-Optical IoT Networks using GNN with Multi-task Learning [11.833896722352568]
This paper addresses the problem of dual-technology scheduling in hybrid Internet of Things (IoT) networks that integrate Optical NeuralOWC and Radio Frequency (RF)<n>We propose a supervised multi-task learning architecture combining a two-stage Graph Embedding with Transformer (DGET) framework.<n>The proposed framework achieves near-optimal scheduling with over 90% classification accuracy, reduces computational complexity, and demonstrates higher robustness under partial channel observability.
arXiv Detail & Related papers (2025-10-29T15:02:28Z) - PASS-Enhanced MEC: Joint Optimization of Task Offloading and Uplink PASS Beamforming [67.78883135636657]
pinching-antenna system (PASS)-enhanced mobile edge computing (MEC) architecture is investigated.<n>PASS establishes short-distance line-of-sight (LoS) links while effectively mitigating the significant path loss and potential signal blockage.<n>We formulate a network latency minimization problem to joint optimize uplink PASS beamforming and task offloading.
arXiv Detail & Related papers (2025-10-27T03:04:46Z) - AoI-Aware Task Offloading and Transmission Optimization for Industrial IoT Networks: A Branching Deep Reinforcement Learning Approach [43.261887758877386]
In the Industrial Internet of Things (IIoT), the frequent transmission of large amounts of data over wireless networks should meet the stringent timeliness requirements.<n>We propose an age-of-information (AoI)-aware multi-base station (BS) real-time monitoring framework to support extensive IIoT deployments.
arXiv Detail & Related papers (2025-10-18T09:14:39Z) - Slim Scheduler: A Runtime-Aware RL and Scheduler System for Efficient CNN Inference [0.0]
Slim Scheduler integrates a Proximal Policy Optimization (PPO) reinforcement learning policy with algorithmic, greedy schedulers to coordinate distributed inference for slimmable models.<n>This hierarchical design reduces search space complexity, mitigates overfitting to specific hardware, and balances efficiency and throughput.
arXiv Detail & Related papers (2025-10-10T05:44:05Z) - Joint Channel Estimation and Computation Offloading in Fluid Antenna-assisted MEC Networks [81.36647816787713]
We propose an FA-assisted offloading framework to minimize the delay of channel estimation.<n>We show that the proposed system significantly reduces the accuracy under efficient communication.
arXiv Detail & Related papers (2025-09-16T08:48:44Z) - Lyapunov-Driven Deep Reinforcement Learning for Edge Inference Empowered
by Reconfigurable Intelligent Surfaces [30.1512069754603]
We propose a novel algorithm for energy-efficient, low-latency, accurate inference at the wireless edge.
We consider a scenario where new data are continuously generated/collected by a set of devices and are handled through a dynamic queueing system.
arXiv Detail & Related papers (2023-05-18T12:46:42Z) - Federated Learning for Energy-limited Wireless Networks: A Partial Model
Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL)
We first devise a novel FL framework with partial model aggregation (PMA)
The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z) - Energy-Efficient Model Compression and Splitting for Collaborative
Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes.
Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.