Reclaimer: A Reinforcement Learning Approach to Dynamic Resource
Allocation for Cloud Microservices
- URL: http://arxiv.org/abs/2304.07941v1
- Date: Mon, 17 Apr 2023 01:44:05 GMT
- Title: Reclaimer: A Reinforcement Learning Approach to Dynamic Resource
Allocation for Cloud Microservices
- Authors: Quintin Fettes, Avinash Karanth, Razvan Bunescu, Brandon Beckwith,
Sreenivas Subramoney
- Abstract summary: We introduce Reclaimer, a deep learning model that adapts to changes in the number and behavior of runtime changes in order to minimize CPU core allocation while meeting requirements.
When evaluated with two micro-service-based applications, Reclaimer reduces the mean CPU core allocation by 38.4% to 74.4% relative to the industry-standard scaling solution.
- Score: 4.397680391942813
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many cloud applications are migrated from the monolithic model to a
microservices framework in which hundreds of loosely-coupled microservices run
concurrently, with significant benefits in terms of scalability, rapid
development, modularity, and isolation. However, dependencies among
microservices with uneven execution time may result in longer queues, idle
resources, or Quality-of-Service (QoS) violations.
In this paper we introduce Reclaimer, a deep reinforcement learning model
that adapts to runtime changes in the number and behavior of microservices in
order to minimize CPU core allocation while meeting QoS requirements. When
evaluated with two benchmark microservice-based applications, Reclaimer reduces
the mean CPU core allocation by 38.4% to 74.4% relative to the
industry-standard scaling solution, and by 27.5% to 58.1% relative to a current
state-of-the art method.
Related papers
- STaleX: A Spatiotemporal-Aware Adaptive Auto-scaling Framework for Microservices [3.0846824529023382]
This paper presents a combination of control theory, machine learning, andtemporals to address these challenges.
We propose an adaptive auto-scaling framework, STXale, that integrates features, enabling real-time resource adjustments.
Our framework accounts for features including service specifications and dependencies among services, as well as temporal variations in workload.
arXiv Detail & Related papers (2025-01-30T20:19:13Z) - FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing [59.12511498024836]
We present a method to prune large language models (LLMs) that selectively prunes model blocks based on an importance score.
We propose a principled metric to replace each pruned block using a weight-sharing mechanism.
Empirical evaluations demonstrate substantial performance gains over existing methods.
arXiv Detail & Related papers (2025-01-24T18:46:37Z) - Microservice Deployment in Space Computing Power Networks via Robust Reinforcement Learning [43.96374556275842]
It is important to provide reliable real-time remote sensing inference services to meet the low-latency requirements.
This paper presents a remote sensing artificial intelligence applications deployment framework designed for Low Earth Orbit satellite constellations.
arXiv Detail & Related papers (2025-01-08T16:55:04Z) - Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset.
We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6.
Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z) - Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources [1.1470070927586018]
We develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization.
We then use the developed model to predict the end-to-end latency.
We demonstrate the merit of a microservice-based application and provide a roadmap to deployment.
arXiv Detail & Related papers (2024-09-04T22:03:07Z) - Entanglement Distribution Delay Optimization in Quantum Networks with Distillation [51.53291671169632]
Quantum networks (QNs) distribute entangled states to enable distributed quantum computing and sensing applications.
QS resource allocation framework is proposed to enhance the end-to-end (e2e) fidelity and satisfy minimum rate and fidelity requirements.
arXiv Detail & Related papers (2024-05-15T02:04:22Z) - Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z) - DeepScaler: Holistic Autoscaling for Microservices Based on
Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach.
It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency.
Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z) - MicroRes: Versatile Resilience Profiling in Microservices via Degradation Dissemination Indexing [29.456286275972474]
Microservice resilience, the ability to recover from failures and continue providing reliable and responsive services, is crucial for cloud vendors.
The current practice relies on manually configured specific rules to a certain microservice system, resulting in labor-intensity and flexibility issues.
Our insight is that resilient deployment can effectively prevent the dissemination of degradation from system performance to user-aware metrics, and the latter affects service quality.
arXiv Detail & Related papers (2022-12-25T03:56:42Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.