AI-based Resource Allocation: Reinforcement Learning for Adaptive
Auto-scaling in Serverless Environments
- URL: http://arxiv.org/abs/2005.14410v1
- Date: Fri, 29 May 2020 06:18:39 GMT
- Title: AI-based Resource Allocation: Reinforcement Learning for Adaptive
Auto-scaling in Serverless Environments
- Authors: Lucia Schuler and Somaya Jamil and Niklas K\"uhl
- Abstract summary: Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years.
A common approach among both commercial and open source serverless computing platforms is workload-based auto-scaling.
In this paper we investigate the applicability of a reinforcement learning approach to request-based auto-scaling in a serverless framework.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Serverless computing has emerged as a compelling new paradigm of cloud
computing models in recent years. It promises the user services at large scale
and low cost while eliminating the need for infrastructure management. On cloud
provider side, flexible resource management is required to meet fluctuating
demand. It can be enabled through automated provisioning and deprovisioning of
resources. A common approach among both commercial and open source serverless
computing platforms is workload-based auto-scaling, where a designated
algorithm scales instances according to the number of incoming requests. In the
recently evolving serverless framework Knative a request-based policy is
proposed, where the algorithm scales resources by a configured maximum number
of requests that can be processed in parallel per instance, the so-called
concurrency. As we show in a baseline experiment, this predefined concurrency
level can strongly influence the performance of a serverless application.
However, identifying the concurrency configuration that yields the highest
possible quality of service is a challenging task due to various factors, e.g.
varying workload and complex infrastructure characteristics, influencing
throughput and latency. While there has been considerable research into
intelligent techniques for optimizing auto-scaling for virtual machine
provisioning, this topic has not yet been discussed in the area of serverless
computing. For this reason, we investigate the applicability of a reinforcement
learning approach, which has been proven on dynamic virtual machine
provisioning, to request-based auto-scaling in a serverless framework. Our
results show that within a limited number of iterations our proposed model
learns an effective scaling policy per workload, improving the performance
compared to the default auto-scaling configuration.
Related papers
- Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - DeepScaler: Holistic Autoscaling for Microservices Based on
Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach.
It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency.
Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z) - Predicting Resource Consumption of Kubernetes Container Systems using
Resource Models [3.138731415322007]
This paper considers how to derive resource models for cloud systems empirically.
We do so based on models of deployed services in a formal language with explicit adherence to CPU and memory resources.
We report on leveraging data collected empirically from small deployments to simulate the execution of higher intensity scenarios on larger deployments.
arXiv Detail & Related papers (2023-05-12T17:59:01Z) - CILP: Co-simulation based Imitation Learner for Dynamic Resource
Provisioning in Cloud Computing Environments [13.864161788250856]
Key challenge for latency-critical tasks is to predict future workload demands to provision proactively.
Existing AI-based solutions tend to not holistically consider all crucial aspects such as provision overheads, heterogeneous VM costs and Quality of Service (QoS) of the cloud system.
We propose a novel method, called CILP, that formulates the VM provisioning problem as two sub-problems of prediction and optimization.
arXiv Detail & Related papers (2023-02-11T09:15:34Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - A Predictive Autoscaler for Elastic Batch Jobs [8.354712625979776]
Large batch jobs such as Deep Learning, HPC and Spark require far more computational resources and higher cost than conventional online service.
We propose a predictive autoscaler to provide an elastic interface for the customers and overprovision instances.
arXiv Detail & Related papers (2020-10-10T17:35:55Z) - Reinforcement Learning on Computational Resource Allocation of
Cloud-based Wireless Networks [22.06811314358283]
Wireless networks used for Internet of Things (IoT) are expected to largely involve cloud-based computing and processing.
In a cloud environment, dynamic computational resource allocation is essential to save energy while maintaining the performance of the processes.
This paper models this dynamic computational resource allocation problem into a Markov Decision Process (MDP) and designs a model-based reinforcement-learning agent to optimise the dynamic resource allocation of the CPU usage.
The results show that our agent rapidly converges to the optimal policy, stably performs in different settings, outperforms or at least equally performs compared to a baseline algorithm in energy savings for different scenarios.
arXiv Detail & Related papers (2020-10-10T15:16:26Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.