Related papers: Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems

Related papers

Orchestrating Multimodal DNN Workloads in Wireless Neural Processing [57.510786937781866]
In edge inference, wireless resource allocation and accelerator deep neural computation (DNN) scheduling have yet to be co-optimized in an end-to-end manner.<n>This paper investigates a paradigm that integrates wireless transmission and multi-core execution into a unified end-to-end pipeline.
arXiv Detail & Related papers (2026-03-02T17:25:43Z)
Asynchronous MultiAgent Reinforcement Learning for 5G Routing under Side Constraints [1.0732935873226022]
We propose an asynchronous multi-agent reinforcement learning framework in which independent PPO agents plan routes in parallel and commit resource deltas to a shared global resource environment.<n>We evaluate the method on an O-RAN like network simulation using nearly real-time traffic data from the city of Montreal.<n>AMARL achieves a similar Grade of Service (GoS) and end-to-end latency, with reduced training wall-clock time and improved robustness to demand shifts.
arXiv Detail & Related papers (2026-01-18T18:38:37Z)
TimeGNN-Augmented Hybrid-Action MARL for Fine-Grained Task Partitioning and Energy-Aware Offloading in MEC [39.30264321748534]
This paper proposes a collaborative computing framework for multiple edge servers.<n>It incorporates a temporal graph neural network (TimeGNN) to model and predict time series of multi-dimensional server state information.<n>It also introduces a multi-agent deterministic policy gradient algorithm (DC-MADDPG) in a discrete-continuous hybrid action space.
arXiv Detail & Related papers (2026-01-08T02:24:58Z)
AoI-Aware Task Offloading and Transmission Optimization for Industrial IoT Networks: A Branching Deep Reinforcement Learning Approach [43.261887758877386]
In the Industrial Internet of Things (IIoT), the frequent transmission of large amounts of data over wireless networks should meet the stringent timeliness requirements.<n>We propose an age-of-information (AoI)-aware multi-base station (BS) real-time monitoring framework to support extensive IIoT deployments.
arXiv Detail & Related papers (2025-10-18T09:14:39Z)
Dynamic Speculative Agent Planning [57.630218933994534]
Large language-model-based agents face critical deployment challenges due to prohibitive latency and inference costs.<n>We introduce Dynamic Speculative Planning (DSP), an online reinforcement learning framework that provides lossless acceleration with substantially reduced costs.<n>Experiments on two standard agent benchmarks demonstrate that DSP achieves comparable efficiency to the fastest acceleration method while reducing total cost by 30% and unnecessary cost up to 60%.
arXiv Detail & Related papers (2025-09-02T03:34:36Z)
Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping [54.65536245955678]
We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome the challenge of sample inefficiency.<n>We introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis.<n> Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL.
arXiv Detail & Related papers (2025-07-22T05:51:07Z)
Tempo: Application-aware LLM Serving with Mixed SLO Requirements [7.290735867969561]
We introduce Tempo, a scheduler designed to maximize service gain across diverse LLM workloads. Our evaluation shows that Tempo improves end-to-end service gain by up to 8.3$times$ achieves and up to 10.3$times$ SLO goodput compared to state-of-the-art designs.
arXiv Detail & Related papers (2025-04-24T05:55:21Z)
Cluster-Based Multi-Agent Task Scheduling for Space-Air-Ground Integrated Networks [60.085771314013044]
Low-altitude economy holds significant potential for development in areas such as communication and sensing. We propose a Clustering-based Multi-agent Deep Deterministic Policy Gradient (CMADDPG) algorithm to address the multi-UAV cooperative task scheduling challenges in SAGIN.
arXiv Detail & Related papers (2024-12-14T06:17:33Z)
ConServe: Fine-Grained GPU Harvesting for LLM Online and Offline Co-Serving [61.35068981176018]
ConServe is a large language model (LLM) serving system that achieves high throughput and strong online latency guarantees.<n>We show that ConServe delivers an average of 2.2$times$ higher throughput and reduces online serving tail latency by 2.9$times$ on average compared to state-of-the-art systems.
arXiv Detail & Related papers (2024-10-02T04:12:13Z)
Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant Multi-Accelerator Systems via Reinforcement Learning [1.8781124875646162]
It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific management in multi-tenant, multi-accelerator cloud environments. A novel online scheduling algorithm for Deep Neural Networks in multi-accelerator systems is proposed.
arXiv Detail & Related papers (2024-02-09T07:25:07Z)
Joint User Association, Interference Cancellation and Power Control for Multi-IRS Assisted UAV Communications [80.35959154762381]
Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way. Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs. We propose a new optimization algorithm for joint IRS-user association, trajectory optimization of UAVs, successive interference cancellation (SIC) decoding order scheduling and power allocation.
arXiv Detail & Related papers (2023-12-08T01:57:10Z)
Client Orchestration and Cost-Efficient Joint Optimization for NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation. We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z)
Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability. We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments. We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks [3.8537852783718627]
MoCA is an adaptive multi-tenancy system for deep neural networks (DNNs) accelerators. It dynamically manages shared memory resources of co-located applications to meet their targets. We demonstrate that MoCA improves the satisfaction rate of the service level agreement (SLA) up to 3.9x (1.8x average), system throughput by 2.3x (1.7x average), and fairness by 1.3x (1.2x average) compared to prior work.
arXiv Detail & Related papers (2023-05-10T02:24:50Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
A Graph Neural Networks based Framework for Topology-Aware Proactive SLA Management in a Latency Critical NFV Application Use-case [0.34376560669160383]
Recent advancements in 5G and 6G have led to the emergence of latency-critical applications delivered via a Network-series (NFV) enabled paradigm. We propose a proactive SLA management framework leveraging Graph Neural Networks (GNN) and Deep Reinforcement Learning (DRL) to balance the trade-off between efficiency and reliability.
arXiv Detail & Related papers (2022-11-10T23:22:05Z)
An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things [19.277349546331557]
Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling. Non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows. Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.
arXiv Detail & Related papers (2022-07-17T16:52:51Z)
Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z)
Age of Information Aware VNF Scheduling in Industrial IoT Using Deep Reinforcement Learning [9.780232937571599]
Deep reinforcement learning (DRL) has appeared as a viable way to solve such problems. In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions. We then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other.
arXiv Detail & Related papers (2021-05-10T09:04:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.