Related papers: Slim Scheduler: A Runtime-Aware RL and Scheduler System for Efficient CNN Inference

Slim Scheduler: A Runtime-Aware RL and Scheduler System for Efficient CNN Inference

URL: http://arxiv.org/abs/2510.09018v1
Date: Fri, 10 Oct 2025 05:44:05 GMT
Title: Slim Scheduler: A Runtime-Aware RL and Scheduler System for Efficient CNN Inference
Authors: Ian Harshbarger, Calvin Chidambaram,
Abstract summary: Slim Scheduler integrates a Proximal Policy Optimization (PPO) reinforcement learning policy with algorithmic, greedy schedulers to coordinate distributed inference for slimmable models.<n>This hierarchical design reduces search space complexity, mitigates overfitting to specific hardware, and balances efficiency and throughput.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most neural network scheduling research focuses on optimizing static, end-to-end models of fixed width, overlooking dynamic approaches that adapt to heterogeneous hardware and fluctuating runtime conditions. We present Slim Scheduler, a hybrid scheduling framework that integrates a Proximal Policy Optimization (PPO) reinforcement learning policy with algorithmic, greedy schedulers to coordinate distributed inference for slimmable models. Each server runs a local greedy scheduler that batches compatible requests and manages instance scaling based on VRAM and utilization constraints, while the PPO router learns global routing policies for device selection, width ratio, and batch configuration. This hierarchical design reduces search space complexity, mitigates overfitting to specific hardware, and balances efficiency and throughput. Compared to a purely randomized task distribution baseline, Slim Scheduler can achieve various accuracy and latency trade-offs such as: A 96.45% reduction in mean latency and a 97.31% reduction in energy usage dropping accuracy to the slimmest model available (70.3%). It can then accomplish an overall reduction in average latency plus energy consumption with an increase in accuracy at the cost of higher standard deviations of said latency and energy, effecting overall task throughput.

Related papers

Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission [23.81409473238433]
Device-edge collaborative inference with Deep Neural Networks (DNNs) faces fundamental trade-offs among accuracy, latency and energy consumption.<n>This paper proposes a novel ENergy-ACcuracy Hierarchical optimization framework for split Inference, named ENACHI.<n> Experiments on ImageNet dataset demonstrate that ENACHI outperforms state-of-the-art benchmarks under varying deadlines and bandwidths.
arXiv Detail & Related papers (2026-01-13T01:56:46Z)
Q-Learning-Based Time-Critical Data Aggregation Scheduling in IoT [3.361625512902259]
Time-critical data aggregation in Internet of Things (IoT) networks demands efficient, collision-free scheduling.<n>Traditional methods, with two-phase tree construction and scheduling, often suffer from high computational overhead and suboptimal delays.<n>We propose a novel Q-learning framework that unifies aggregation tree construction and scheduling, modeling the process as a Markov Decision Process (MDP) with hashed states for scalability.
arXiv Detail & Related papers (2025-10-29T15:46:21Z)
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems [62.24576366776727]
We propose a latency-aware scheduling framework to minimize total inference latency.<n>We show that the proposed method significantly reduces cold-start latency compared to baseline strategies.
arXiv Detail & Related papers (2025-08-15T07:49:22Z)
Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z)
GPU Cluster Scheduling for Network-Sensitive Deep Learning [19.344426053952464]
We propose a novel GPU-cluster scheduler for distributed DL (DDL) workloads. Our scheduler consists of three major components: (i) a classical delay scheduling algorithm to facilitate job placement and consolidation; (ii) a network-sensitive job preemption strategy; and (iii) an "auto-tuner" mechanism to optimize delay timers for effective delay scheduling.
arXiv Detail & Related papers (2024-01-29T19:06:08Z)
Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks. It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z)
Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks [44.37047471448793]
In this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL) We propose an innovative PSL framework, namely, efficient parallel split learning (EPSL) to accelerate model training. We show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy.
arXiv Detail & Related papers (2023-03-26T16:09:48Z)
Generating Dispatching Rules for the Interrupting Swap-Allowed Blocking Job Shop Problem Using Graph Neural Network and Reinforcement Learning [21.021840570685264]
The interrupting swap-allowed blocking job shop problem (ISBJSSP) is able to model many manufacturing planning and logistics applications realistically. We introduce a dynamic disjunctive graph formulation characterized by nodes and edges subjected to continuous deletions and additions. A simulator is developed to simulate interruption, swapping, and blocking in the ISBJSSP setting.
arXiv Detail & Related papers (2023-02-05T23:35:21Z)
SMDP-Based Dynamic Batching for Efficient Inference on GPU-Based Platforms [14.42787221783853]
This paper aims to provide a dynamic graphics policy that strikes a balance between efficiency and latency. The proposed solution has notable flexibility in balancing power consumption and latency.
arXiv Detail & Related papers (2023-01-30T13:19:16Z)
Federated Learning for Energy-limited Wireless Networks: A Partial Model Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL) We first devise a novel FL framework with partial model aggregation (PMA) The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems. First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch. Second, for diverse system scales and structures, we use graph neural networks to embed system state information. Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.