Reclaimer: A Reinforcement Learning Approach to Dynamic Resource
Allocation for Cloud Microservices
- URL: http://arxiv.org/abs/2304.07941v1
- Date: Mon, 17 Apr 2023 01:44:05 GMT
- Title: Reclaimer: A Reinforcement Learning Approach to Dynamic Resource
Allocation for Cloud Microservices
- Authors: Quintin Fettes, Avinash Karanth, Razvan Bunescu, Brandon Beckwith,
Sreenivas Subramoney
- Abstract summary: We introduce Reclaimer, a deep learning model that adapts to changes in the number and behavior of runtime changes in order to minimize CPU core allocation while meeting requirements.
When evaluated with two micro-service-based applications, Reclaimer reduces the mean CPU core allocation by 38.4% to 74.4% relative to the industry-standard scaling solution.
- Score: 4.397680391942813
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many cloud applications are migrated from the monolithic model to a
microservices framework in which hundreds of loosely-coupled microservices run
concurrently, with significant benefits in terms of scalability, rapid
development, modularity, and isolation. However, dependencies among
microservices with uneven execution time may result in longer queues, idle
resources, or Quality-of-Service (QoS) violations.
In this paper we introduce Reclaimer, a deep reinforcement learning model
that adapts to runtime changes in the number and behavior of microservices in
order to minimize CPU core allocation while meeting QoS requirements. When
evaluated with two benchmark microservice-based applications, Reclaimer reduces
the mean CPU core allocation by 38.4% to 74.4% relative to the
industry-standard scaling solution, and by 27.5% to 58.1% relative to a current
state-of-the art method.
Related papers
- Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources [1.1470070927586018]
We develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization.
We then use the developed model to predict the end-to-end latency.
We demonstrate the merit of a microservice-based application and provide a roadmap to deployment.
arXiv Detail & Related papers (2024-09-04T22:03:07Z) - ThinK: Thinner Key Cache by Query-Driven Pruning [63.13363917871414]
Large Language Models (LLMs) have revolutionized the field of natural language processing, achieving unprecedented performance across a variety of applications.
This paper focuses on the long-context scenario, addressing the inefficiencies in KV cache memory consumption during inference.
We propose ThinK, a novel query-dependent KV cache pruning method designed to minimize attention weight loss while selectively pruning the least significant channels.
arXiv Detail & Related papers (2024-07-30T17:59:08Z) - Entanglement Distribution Delay Optimization in Quantum Networks with Distillation [51.53291671169632]
Quantum networks (QNs) distribute entangled states to enable distributed quantum computing and sensing applications.
QS resource allocation framework is proposed to enhance the end-to-end (e2e) fidelity and satisfy minimum rate and fidelity requirements.
arXiv Detail & Related papers (2024-05-15T02:04:22Z) - Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement
Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems.
We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics.
We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z) - Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z) - DeepScaler: Holistic Autoscaling for Microservices Based on
Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach.
It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency.
Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z) - Handling Communication via APIs for Microservices [6.5499625417846685]
We discuss the challenges with conventional techniques of communication using and propose an alternative way of ID-passing via APIs.
We also devise an algorithm to reduce the number of APIs.
arXiv Detail & Related papers (2023-08-02T17:40:34Z) - Benchmarking scalability of stream processing frameworks deployed as
microservices in the cloud [0.38073142980732994]
We benchmark five modern stream processing frameworks regarding their scalability using a systematic method.
All benchmarked frameworks exhibit approximately linear scalability as long as sufficient cloud resources are provisioned.
There is no clear superior framework, but the ranking of the frameworks on the use case.
arXiv Detail & Related papers (2023-03-20T13:22:03Z) - MicroRes: Versatile Resilience Profiling in Microservices via Degradation Dissemination Indexing [29.456286275972474]
Microservice resilience, the ability to recover from failures and continue providing reliable and responsive services, is crucial for cloud vendors.
The current practice relies on manually configured specific rules to a certain microservice system, resulting in labor-intensity and flexibility issues.
Our insight is that resilient deployment can effectively prevent the dissemination of degradation from system performance to user-aware metrics, and the latter affects service quality.
arXiv Detail & Related papers (2022-12-25T03:56:42Z) - Asynchronous Parallel Incremental Block-Coordinate Descent for
Decentralized Machine Learning [55.198301429316125]
Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing.
For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data.
This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices.
arXiv Detail & Related papers (2022-02-07T15:04:15Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.