Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant
Multi-Accelerator Systems via Reinforcement Learning
- URL: http://arxiv.org/abs/2403.00766v1
- Date: Fri, 9 Feb 2024 07:25:07 GMT
- Title: Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant
Multi-Accelerator Systems via Reinforcement Learning
- Authors: Enrico Russo, Francesco Giulio Blanco, Maurizio Palesi, Giuseppe
Ascia, Davide Patti, Vincenzo Catania
- Abstract summary: It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific management in multi-tenant, multi-accelerator cloud environments.
A novel online scheduling algorithm for Deep Neural Networks in multi-accelerator systems is proposed.
- Score: 1.8781124875646162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the critical challenge of managing Quality of Service
(QoS) in cloud services, focusing on the nuances of individual tenant
expectations and varying Service Level Indicators (SLIs). It introduces a novel
approach utilizing Deep Reinforcement Learning for tenant-specific QoS
management in multi-tenant, multi-accelerator cloud environments. The chosen
SLI, deadline hit rate, allows clients to tailor QoS for each service request.
A novel online scheduling algorithm for Deep Neural Networks in
multi-accelerator systems is proposed, with a focus on guaranteeing
tenant-wise, model-specific QoS levels while considering real-time constraints.
Related papers
- Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems [1.7724466261976437]
This paper presents RELMAS, a low-overhead deep reinforcement learning algorithm designed for the online scheduling of DNNs in multi-tenant environments.
The application of RELMAS to a heterogeneous multi-accelerator system resulted in up to a 173% improvement in SLA satisfaction rate.
arXiv Detail & Related papers (2024-04-13T10:13:07Z) - Large Language Model Meets Graph Neural Network in Knowledge Distillation [7.686812700685084]
We propose a temporal-aware framework for predicting Quality of Service (QoS) in service-oriented architectures.
Our proposed TOGCL framework significantly outperforms state-of-the-art methods across multiple metrics, achieving improvements of up to 38.80%.
arXiv Detail & Related papers (2024-02-08T18:33:21Z) - Client Orchestration and Cost-Efficient Joint Optimization for
NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation.
We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z) - Self-Sustaining Multiple Access with Continual Deep Reinforcement
Learning for Dynamic Metaverse Applications [17.436875530809946]
The Metaverse is a new paradigm that aims to create a virtual environment consisting of numerous worlds, each of which will offer a different set of services.
To deal with such a dynamic and complex scenario, one potential approach is to adopt self-sustaining strategies.
This paper investigates the problem of multiple access in multi-channel environments to maximize the throughput of the intelligent agent.
arXiv Detail & Related papers (2023-09-18T22:02:47Z) - Elastic Entangled Pair and Qubit Resource Management in Quantum Cloud
Computing [73.7522199491117]
Quantum cloud computing (QCC) offers a promising approach to efficiently provide quantum computing resources.
The fluctuations in user demand and quantum circuit requirements are challenging for efficient resource provisioning.
We propose a resource allocation model to provision quantum computing and networking resources.
arXiv Detail & Related papers (2023-07-25T00:38:46Z) - Scaling Limits of Quantum Repeater Networks [62.75241407271626]
Quantum networks (QNs) are a promising platform for secure communications, enhanced sensing, and efficient distributed quantum computing.
Due to the fragile nature of quantum states, these networks face significant challenges in terms of scalability.
In this paper, the scaling limits of quantum repeater networks (QRNs) are analyzed.
arXiv Detail & Related papers (2023-05-15T14:57:01Z) - A Graph Neural Networks based Framework for Topology-Aware Proactive SLA
Management in a Latency Critical NFV Application Use-case [0.34376560669160383]
Recent advancements in 5G and 6G have led to the emergence of latency-critical applications delivered via a Network-series (NFV) enabled paradigm.
We propose a proactive SLA management framework leveraging Graph Neural Networks (GNN) and Deep Reinforcement Learning (DRL) to balance the trade-off between efficiency and reliability.
arXiv Detail & Related papers (2022-11-10T23:22:05Z) - Artificial Intelligence Empowered Multiple Access for Ultra Reliable and
Low Latency THz Wireless Networks [76.89730672544216]
Terahertz (THz) wireless networks are expected to catalyze the beyond fifth generation (B5G) era.
To satisfy the ultra-reliability and low-latency demands of several B5G applications, novel mobility management approaches are required.
This article presents a holistic MAC layer approach that enables intelligent user association and resource allocation, as well as flexible and adaptive mobility management.
arXiv Detail & Related papers (2022-08-17T03:00:24Z) - QoS-Aware Scheduling in New Radio Using Deep Reinforcement Learning [2.3857747529378917]
We propose a QA-Aware Deep Reinforcement learning Agent (QADRA) scheduler for NR networks.
In our particular evaluation scenario, the QADRA scheduler improves network throughput by 30% while simultaneously maintaining the satisfaction rate of users served by the network.
arXiv Detail & Related papers (2021-07-14T09:18:39Z) - Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud
System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems.
First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch.
Second, for diverse system scales and structures, we use graph neural networks to embed system state information.
Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z) - Information Freshness-Aware Task Offloading in Air-Ground Integrated
Edge Computing Systems [49.80033982995667]
This paper studies the problem of information freshness-aware task offloading in an air-ground integrated multi-access edge computing system.
A third-party real-time application service provider provides computing services to the subscribed mobile users (MUs) with the limited communication and computation resources from the InP.
We derive a novel deep reinforcement learning (RL) scheme that adopts two separate double deep Q-networks for each MU to approximate the Q-factor and the post-decision Q-factor.
arXiv Detail & Related papers (2020-07-15T21:32:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.