Related papers: A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling

A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling

URL: http://arxiv.org/abs/2311.12839v1
Date: Thu, 5 Oct 2023 09:26:04 GMT
Title: A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling
Authors: Amjad Yousef Majid, Eduard Marin
Abstract summary: This paper presents a comprehensive review of the application of Deep Reinforcement Learning (DRL) techniques in serverless computing. A systematic review of recent studies applying DRL to serverless computing is presented, covering various algorithms, models, and performances. Our analysis reveals that DRL, with its ability to learn and adapt from an environment, shows promising results in improving the efficiency of function scheduling and resource scaling.
Score: 2.0722667822370386
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the rapidly evolving field of serverless computing, efficient function scheduling and resource scaling are critical for optimizing performance and cost. This paper presents a comprehensive review of the application of Deep Reinforcement Learning (DRL) techniques in these areas. We begin by providing an overview of serverless computing, highlighting its benefits and challenges, with a particular focus on function scheduling and resource scaling. We then delve into the principles of deep reinforcement learning (DRL) and its potential for addressing these challenges. A systematic review of recent studies applying DRL to serverless computing is presented, covering various algorithms, models, and performances. Our analysis reveals that DRL, with its ability to learn and adapt from an environment, shows promising results in improving the efficiency of function scheduling and resource scaling in serverless computing. However, several challenges remain, including the need for more realistic simulation environments, handling of cold starts, and the trade-off between learning time and scheduling performance. We conclude by discussing potential future directions for this research area, emphasizing the need for more robust DRL models, better benchmarking methods, and the exploration of multi-agent reinforcement learning for more complex serverless architectures. This review serves as a valuable resource for researchers and practitioners aiming to understand and advance the application of DRL in serverless computing.

Related papers

Research on Edge Computing and Cloud Collaborative Resource Scheduling Optimization Based on Deep Reinforcement Learning [11.657154571216234]
This study addresses the challenge of resource scheduling optimization in edge-cloud collaborative computing using deep reinforcement learning (DRL) The proposed DRL-based approach improves task processing efficiency, reduces overall processing time, enhances resource utilization, and effectively controls task migrations.
arXiv Detail & Related papers (2025-02-26T03:05:11Z)
Deep Reinforcement Learning for Job Scheduling and Resource Management in Cloud Computing: An Algorithm-Level Review [10.015735252600793]
Deep Reinforcement Learning (DRL) has emerged as a promising solution to these challenges. DRL enables systems to learn and adapt policies based on continuous observations of the environment. This survey provides a comprehensive review of DRL-based algorithms for job scheduling and resource management in cloud computing.
arXiv Detail & Related papers (2025-01-02T02:08:00Z)
Reinforcement Learning for Adaptive Resource Scheduling in Complex System Environments [8.315191578007857]
This study presents a novel computer system performance optimization and adaptive workload management scheduling algorithm based on Q-learning. By contrast, Q-learning, a reinforcement learning algorithm, continuously learns from system state changes, enabling dynamic scheduling and resource optimization. This research provides a foundation for the integration of AI-driven adaptive scheduling in future large-scale systems, offering a scalable, intelligent solution to enhance system performance, reduce operating costs, and support sustainable energy consumption.
arXiv Detail & Related papers (2024-11-08T05:58:09Z)
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z)
Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey [48.06362354403557]
This survey reviews the literature, mainly from 2019 to 2024, on efficient resource allocation and workload scheduling strategies for large-scale distributed DL. We highlight critical challenges for each topic and discuss key insights of existing technologies. This survey aims to encourage computer science, artificial intelligence, and communications researchers to understand recent advances.
arXiv Detail & Related papers (2024-06-12T11:51:44Z)
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains. This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions [2.4541568670428915]
Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications. Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics. This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations.
arXiv Detail & Related papers (2023-10-04T22:45:09Z)
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications. With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z)
Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters [30.952491139350908]
We present a comprehensive study about the characteristics of Deep Learning jobs and resource management. We introduce a general-purpose framework, which manages resources based on historical data. As case studies, we design: a Quasi-Shortest-Service-First scheduling service, which can minimize the cluster-wide average job completion time by up to 6.5x; and a Cluster Energy Saving service, which improves overall cluster utilization by up to 13%.
arXiv Detail & Related papers (2021-09-03T05:02:52Z)
Smart Scheduling based on Deep Reinforcement Learning for Cellular Networks [18.04856086228028]
We propose a smart scheduling scheme based on deep reinforcement learning (DRL) We provide implementation-friend designs, i.e., a scalable neural network design for the agent and a virtual environment training framework. We show that the DRL-based smart scheduling outperforms the conventional scheduling method and can be adopted in practical systems.
arXiv Detail & Related papers (2021-03-22T02:09:16Z)
Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
PoPS: Policy Pruning and Shrinking for Deep Reinforcement Learning [16.269923100433232]
We develop a working algorithm, named Policy Pruning and Shrinking (PoPS), to train DRL models with strong performance. PoPS is based on a novel iterative policy pruning and shrinking method that leverages the power of transfer learning. We present an extensive experimental study that demonstrates the strong performance of PoPS using the popular Cartpole, Lunar Lander, Pong, and Pacman environments.
arXiv Detail & Related papers (2020-01-14T19:28:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.