A Review of Deep Reinforcement Learning in Serverless Computing:
Function Scheduling and Resource Auto-Scaling
- URL: http://arxiv.org/abs/2311.12839v1
- Date: Thu, 5 Oct 2023 09:26:04 GMT
- Title: A Review of Deep Reinforcement Learning in Serverless Computing:
Function Scheduling and Resource Auto-Scaling
- Authors: Amjad Yousef Majid, Eduard Marin
- Abstract summary: This paper presents a comprehensive review of the application of Deep Reinforcement Learning (DRL) techniques in serverless computing.
A systematic review of recent studies applying DRL to serverless computing is presented, covering various algorithms, models, and performances.
Our analysis reveals that DRL, with its ability to learn and adapt from an environment, shows promising results in improving the efficiency of function scheduling and resource scaling.
- Score: 2.0722667822370386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving field of serverless computing, efficient function
scheduling and resource scaling are critical for optimizing performance and
cost. This paper presents a comprehensive review of the application of Deep
Reinforcement Learning (DRL) techniques in these areas. We begin by providing
an overview of serverless computing, highlighting its benefits and challenges,
with a particular focus on function scheduling and resource scaling. We then
delve into the principles of deep reinforcement learning (DRL) and its
potential for addressing these challenges. A systematic review of recent
studies applying DRL to serverless computing is presented, covering various
algorithms, models, and performances. Our analysis reveals that DRL, with its
ability to learn and adapt from an environment, shows promising results in
improving the efficiency of function scheduling and resource scaling in
serverless computing. However, several challenges remain, including the need
for more realistic simulation environments, handling of cold starts, and the
trade-off between learning time and scheduling performance. We conclude by
discussing potential future directions for this research area, emphasizing the
need for more robust DRL models, better benchmarking methods, and the
exploration of multi-agent reinforcement learning for more complex serverless
architectures. This review serves as a valuable resource for researchers and
practitioners aiming to understand and advance the application of DRL in
serverless computing.
Related papers
- Reinforcement Learning for Adaptive Resource Scheduling in Complex System Environments [8.315191578007857]
This study presents a novel computer system performance optimization and adaptive workload management scheduling algorithm based on Q-learning.
By contrast, Q-learning, a reinforcement learning algorithm, continuously learns from system state changes, enabling dynamic scheduling and resource optimization.
This research provides a foundation for the integration of AI-driven adaptive scheduling in future large-scale systems, offering a scalable, intelligent solution to enhance system performance, reduce operating costs, and support sustainable energy consumption.
arXiv Detail & Related papers (2024-11-08T05:58:09Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey [48.06362354403557]
This survey reviews the literature, mainly from 2019 to 2024, on efficient resource allocation and workload scheduling strategies for large-scale distributed DL.
We highlight critical challenges for each topic and discuss key insights of existing technologies.
This survey aims to encourage computer science, artificial intelligence, and communications researchers to understand recent advances.
arXiv Detail & Related papers (2024-06-12T11:51:44Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - Deep reinforcement learning for machine scheduling: Methodology, the
state-of-the-art, and future directions [2.4541568670428915]
Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications.
Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics.
This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations.
arXiv Detail & Related papers (2023-10-04T22:45:09Z) - On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z) - Characterization and Prediction of Deep Learning Workloads in
Large-Scale GPU Datacenters [30.952491139350908]
We present a comprehensive study about the characteristics of Deep Learning jobs and resource management.
We introduce a general-purpose framework, which manages resources based on historical data.
As case studies, we design: a Quasi-Shortest-Service-First scheduling service, which can minimize the cluster-wide average job completion time by up to 6.5x; and a Cluster Energy Saving service, which improves overall cluster utilization by up to 13%.
arXiv Detail & Related papers (2021-09-03T05:02:52Z) - Smart Scheduling based on Deep Reinforcement Learning for Cellular
Networks [18.04856086228028]
We propose a smart scheduling scheme based on deep reinforcement learning (DRL)
We provide implementation-friend designs, i.e., a scalable neural network design for the agent and a virtual environment training framework.
We show that the DRL-based smart scheduling outperforms the conventional scheduling method and can be adopted in practical systems.
arXiv Detail & Related papers (2021-03-22T02:09:16Z) - Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput.
Until today, no such learning-based algorithms have shown practical potential in this domain.
We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks.
We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z) - PoPS: Policy Pruning and Shrinking for Deep Reinforcement Learning [16.269923100433232]
We develop a working algorithm, named Policy Pruning and Shrinking (PoPS), to train DRL models with strong performance.
PoPS is based on a novel iterative policy pruning and shrinking method that leverages the power of transfer learning.
We present an extensive experimental study that demonstrates the strong performance of PoPS using the popular Cartpole, Lunar Lander, Pong, and Pacman environments.
arXiv Detail & Related papers (2020-01-14T19:28:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.