Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems
- URL: http://arxiv.org/abs/2404.08950v1
- Date: Sat, 13 Apr 2024 10:13:07 GMT
- Title: Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems
- Authors: Francesco G. Blanco, Enrico Russo, Maurizio Palesi, Davide Patti, Giuseppe Ascia, Vincenzo Catania,
- Abstract summary: This paper presents RELMAS, a low-overhead deep reinforcement learning algorithm designed for the online scheduling of DNNs in multi-tenant environments.
The application of RELMAS to a heterogeneous multi-accelerator system resulted in up to a 173% improvement in SLA satisfaction rate.
- Score: 1.7724466261976437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, there is a growing trend of outsourcing the execution of DNNs to cloud services. For service providers, managing multi-tenancy and ensuring high-quality service delivery, particularly in meeting stringent execution time constraints, assumes paramount importance, all while endeavoring to maintain cost-effectiveness. In this context, the utilization of heterogeneous multi-accelerator systems becomes increasingly relevant. This paper presents RELMAS, a low-overhead deep reinforcement learning algorithm designed for the online scheduling of DNNs in multi-tenant environments, taking into account the dataflow heterogeneity of accelerators and memory bandwidths contentions. By doing so, service providers can employ the most efficient scheduling policy for user requests, optimizing Service-Level-Agreement (SLA) satisfaction rates and enhancing hardware utilization. The application of RELMAS to a heterogeneous multi-accelerator system composed of various instances of Simba and Eyeriss sub-accelerators resulted in up to a 173% improvement in SLA satisfaction rate compared to state-of-the-art scheduling techniques across different workload scenarios, with less than a 1.5% energy overhead.
Related papers
- Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant
Multi-Accelerator Systems via Reinforcement Learning [1.8781124875646162]
It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific management in multi-tenant, multi-accelerator cloud environments.
A novel online scheduling algorithm for Deep Neural Networks in multi-accelerator systems is proposed.
arXiv Detail & Related papers (2024-02-09T07:25:07Z) - Joint User Association, Interference Cancellation and Power Control for
Multi-IRS Assisted UAV Communications [80.35959154762381]
Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way.
Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs.
We propose a new optimization algorithm for joint IRS-user association, trajectory optimization of UAVs, successive interference cancellation (SIC) decoding order scheduling and power allocation.
arXiv Detail & Related papers (2023-12-08T01:57:10Z) - Client Orchestration and Cost-Efficient Joint Optimization for
NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation.
We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z) - Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with
Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural
Networks [3.8537852783718627]
MoCA is an adaptive multi-tenancy system for deep neural networks (DNNs) accelerators.
It dynamically manages shared memory resources of co-located applications to meet their targets.
We demonstrate that MoCA improves the satisfaction rate of the service level agreement (SLA) up to 3.9x (1.8x average), system throughput by 2.3x (1.7x average), and fairness by 1.3x (1.2x average) compared to prior work.
arXiv Detail & Related papers (2023-05-10T02:24:50Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - A Graph Neural Networks based Framework for Topology-Aware Proactive SLA
Management in a Latency Critical NFV Application Use-case [0.34376560669160383]
Recent advancements in 5G and 6G have led to the emergence of latency-critical applications delivered via a Network-series (NFV) enabled paradigm.
We propose a proactive SLA management framework leveraging Graph Neural Networks (GNN) and Deep Reinforcement Learning (DRL) to balance the trade-off between efficiency and reliability.
arXiv Detail & Related papers (2022-11-10T23:22:05Z) - An Intelligent Deterministic Scheduling Method for Ultra-Low Latency
Communication in Edge Enabled Industrial Internet of Things [19.277349546331557]
Time Sensitive Network (TSN) is recently researched to realize low latency communication via deterministic scheduling.
Non-collision theory based deterministic scheduling (NDS) method is proposed to achieve ultra-low latency communication for the time-sensitive flows.
Experiment results demonstrate that NDS/DQS can well support deterministic ultra-low latency services and guarantee efficient bandwidth utilization.
arXiv Detail & Related papers (2022-07-17T16:52:51Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - Age of Information Aware VNF Scheduling in Industrial IoT Using Deep
Reinforcement Learning [9.780232937571599]
Deep reinforcement learning (DRL) has appeared as a viable way to solve such problems.
In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions.
We then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other.
arXiv Detail & Related papers (2021-05-10T09:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.