Deep Reinforcement Learning based Online Scheduling Policy for Deep   Neural Network Multi-Tenant Multi-Accelerator Systems
        - URL: http://arxiv.org/abs/2404.08950v1
- Date: Sat, 13 Apr 2024 10:13:07 GMT
- Title: Deep Reinforcement Learning based Online Scheduling Policy for Deep   Neural Network Multi-Tenant Multi-Accelerator Systems
- Authors: Francesco G. Blanco, Enrico Russo, Maurizio Palesi, Davide Patti, Giuseppe Ascia, Vincenzo Catania, 
- Abstract summary: This paper presents RELMAS, a low-overhead deep reinforcement learning algorithm designed for the online scheduling of DNNs in multi-tenant environments.
The application of RELMAS to a heterogeneous multi-accelerator system resulted in up to a 173% improvement in SLA satisfaction rate.
- Score: 1.7724466261976437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Currently, there is a growing trend of outsourcing the execution of DNNs to cloud services. For service providers, managing multi-tenancy and ensuring high-quality service delivery, particularly in meeting stringent execution time constraints, assumes paramount importance, all while endeavoring to maintain cost-effectiveness. In this context, the utilization of heterogeneous multi-accelerator systems becomes increasingly relevant. This paper presents RELMAS, a low-overhead deep reinforcement learning algorithm designed for the online scheduling of DNNs in multi-tenant environments, taking into account the dataflow heterogeneity of accelerators and memory bandwidths contentions. By doing so, service providers can employ the most efficient scheduling policy for user requests, optimizing Service-Level-Agreement (SLA) satisfaction rates and enhancing hardware utilization. The application of RELMAS to a heterogeneous multi-accelerator system composed of various instances of Simba and Eyeriss sub-accelerators resulted in up to a 173% improvement in SLA satisfaction rate compared to state-of-the-art scheduling techniques across different workload scenarios, with less than a 1.5% energy overhead. 
 
      
        Related papers
        - Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural   Network Mapping [54.65536245955678]
 We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome the challenge of sample inefficiency.<n>We introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis.<n> Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL.
 arXiv  Detail & Related papers  (2025-07-22T05:51:07Z)
- Tempo: Application-aware LLM Serving with Mixed SLO Requirements [7.290735867969561]
 We introduce Tempo, a scheduler designed to maximize service gain across diverse LLM workloads.
Our evaluation shows that Tempo improves end-to-end service gain by up to 8.3$times$ achieves and up to 10.3$times$ SLO goodput compared to state-of-the-art designs.
 arXiv  Detail & Related papers  (2025-04-24T05:55:21Z)
- Cluster-Based Multi-Agent Task Scheduling for Space-Air-Ground   Integrated Networks [60.085771314013044]
 Low-altitude economy holds significant potential for development in areas such as communication and sensing.
We propose a Clustering-based Multi-agent Deep Deterministic Policy Gradient (CMADDPG) algorithm to address the multi-UAV cooperative task scheduling challenges in SAGIN.
 arXiv  Detail & Related papers  (2024-12-14T06:17:33Z)
- Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant
  Multi-Accelerator Systems via Reinforcement Learning [1.8781124875646162]
 It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific management in multi-tenant, multi-accelerator cloud environments.
A novel online scheduling algorithm for Deep Neural Networks in multi-accelerator systems is proposed.
 arXiv  Detail & Related papers  (2024-02-09T07:25:07Z)
- Joint User Association, Interference Cancellation and Power Control for
  Multi-IRS Assisted UAV Communications [80.35959154762381]
 Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way.
Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs.
We propose a new optimization algorithm for joint IRS-user association, trajectory optimization of UAVs, successive interference cancellation (SIC) decoding order scheduling and power allocation.
 arXiv  Detail & Related papers  (2023-12-08T01:57:10Z)
- Client Orchestration and Cost-Efficient Joint Optimization for
  NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
 We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation.
We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
 arXiv  Detail & Related papers  (2023-11-03T13:34:44Z)
- Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with
  Online Learning [60.17407932691429]
 Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
 arXiv  Detail & Related papers  (2023-09-04T17:30:21Z)
- Reconfigurable Distributed FPGA Cluster Design for Deep Learning
  Accelerators [59.11160990637615]
 We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
 arXiv  Detail & Related papers  (2023-05-24T16:08:55Z)
- MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural
  Networks [3.8537852783718627]
 MoCA is an adaptive multi-tenancy system for deep neural networks (DNNs) accelerators.
It dynamically manages shared memory resources of co-located applications to meet their targets.
We demonstrate that MoCA improves the satisfaction rate of the service level agreement (SLA) up to 3.9x (1.8x average), system throughput by 2.3x (1.7x average), and fairness by 1.3x (1.2x average) compared to prior work.
 arXiv  Detail & Related papers  (2023-05-10T02:24:50Z)
- Unifying Synergies between Self-supervised Learning and Dynamic
  Computation [53.66628188936682]
 We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
 arXiv  Detail & Related papers  (2023-01-22T17:12:58Z)
- A Graph Neural Networks based Framework for Topology-Aware Proactive SLA
  Management in a Latency Critical NFV Application Use-case [0.34376560669160383]
 Recent advancements in 5G and 6G have led to the emergence of latency-critical applications delivered via a Network-series (NFV) enabled paradigm.
We propose a proactive SLA management framework leveraging Graph Neural Networks (GNN) and Deep Reinforcement Learning (DRL) to balance the trade-off between efficiency and reliability.
 arXiv  Detail & Related papers  (2022-11-10T23:22:05Z)
- An Intelligent Deterministic Scheduling Method for Ultra-Low Latency
  Communication in Edge Enabled Industrial Internet of Things [19.277349546331557]
 Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling.
Non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows.
Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.
 arXiv  Detail & Related papers  (2022-07-17T16:52:51Z)
- Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
  Reinforcement Learning [63.83425382922157]
 Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
 arXiv  Detail & Related papers  (2022-03-26T20:37:14Z)
- Age of Information Aware VNF Scheduling in Industrial IoT Using Deep
  Reinforcement Learning [9.780232937571599]
 Deep reinforcement learning (DRL) has appeared as a viable way to solve such problems.
In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions.
We then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other.
 arXiv  Detail & Related papers  (2021-05-10T09:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.