Autonomous Resource Management in Microservice Systems via Reinforcement Learning
- URL: http://arxiv.org/abs/2507.12879v1
- Date: Thu, 17 Jul 2025 07:58:16 GMT
- Title: Autonomous Resource Management in Microservice Systems via Reinforcement Learning
- Authors: Yujun Zou, Nia Qi, Yingnan Deng, Zhihao Xue, Ming Gong, Wuyang Zhang,
- Abstract summary: This paper proposes a reinforcement learning-based method for microservice resource scheduling and optimization.<n>In microservice systems, as the number of services and the load increase, efficiently scheduling and allocating resources becomes a critical research challenge.<n>Under multi-dimensional resource conditions, the proposed method can consider multiple objectives and achieve optimized resource scheduling.
- Score: 15.956459415328775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a reinforcement learning-based method for microservice resource scheduling and optimization, aiming to address issues such as uneven resource allocation, high latency, and insufficient throughput in traditional microservice architectures. In microservice systems, as the number of services and the load increase, efficiently scheduling and allocating resources such as computing power, memory, and storage becomes a critical research challenge. To address this, the paper employs an intelligent scheduling algorithm based on reinforcement learning. Through the interaction between the agent and the environment, the resource allocation strategy is continuously optimized. In the experiments, the paper considers different resource conditions and load scenarios, evaluating the proposed method across multiple dimensions, including response time, throughput, resource utilization, and cost efficiency. The experimental results show that the reinforcement learning-based scheduling method significantly improves system response speed and throughput under low load and high concurrency conditions, while also optimizing resource utilization and reducing energy consumption. Under multi-dimensional resource conditions, the proposed method can consider multiple objectives and achieve optimized resource scheduling. Compared to traditional static resource allocation methods, the reinforcement learning model demonstrates stronger adaptability and optimization capability. It can adjust resource allocation strategies in real time, thereby maintaining good system performance in dynamically changing load and resource environments.
Related papers
- Secure Resource Allocation via Constrained Deep Reinforcement Learning [49.15061461220109]
We present SARMTO, a framework that balances resource allocation, task offloading, security, and performance.<n>SARMTO consistently outperforms five baseline approaches, achieving up to a 40% reduction in system costs.<n>These enhancements highlight SARMTO's potential to revolutionize resource management in intricate distributed computing environments.
arXiv Detail & Related papers (2025-01-20T15:52:43Z) - Dynamic Scheduling Strategies for Resource Optimization in Computing Environments [0.29008108937701327]
This paper proposes a container scheduling method based on multi-objective optimization, which aims to balance key performance indicators such as resource utilization, load balancing and task completion efficiency.<n>The experimental results show that compared with traditional static rule algorithms and efficiency algorithms, the optimized scheduling scheme shows significant advantages in resource utilization, load balancing and burst task completion.
arXiv Detail & Related papers (2024-12-23T05:43:17Z) - Reinforcement Learning for Adaptive Resource Scheduling in Complex System Environments [8.315191578007857]
This study presents a novel computer system performance optimization and adaptive workload management scheduling algorithm based on Q-learning.
By contrast, Q-learning, a reinforcement learning algorithm, continuously learns from system state changes, enabling dynamic scheduling and resource optimization.
This research provides a foundation for the integration of AI-driven adaptive scheduling in future large-scale systems, offering a scalable, intelligent solution to enhance system performance, reduce operating costs, and support sustainable energy consumption.
arXiv Detail & Related papers (2024-11-08T05:58:09Z) - Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey [48.06362354403557]
This survey reviews the literature, mainly from 2019 to 2024, on efficient resource allocation and workload scheduling strategies for large-scale distributed DL.
We highlight critical challenges for each topic and discuss key insights of existing technologies.
This survey aims to encourage computer science, artificial intelligence, and communications researchers to understand recent advances.
arXiv Detail & Related papers (2024-06-12T11:51:44Z) - Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning [55.08287089554127]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.<n>We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.<n>We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - Fast Context Adaptation in Cost-Aware Continual Learning [10.515324071327903]
5G and Beyond networks require more complex learning agents and the learning process itself might end up competing with users for communication and computational resources.
This creates friction: on the one hand, the learning process needs resources to quickly convergence to an effective strategy; on the other hand, the learning process needs to be efficient, i.e. take as few resources as possible from the user's data plane, so as not to throttle users' resources.
In this paper, we propose a dynamic strategy to balance the resources assigned to the data plane and those reserved for learning.
arXiv Detail & Related papers (2023-06-06T17:46:48Z) - The Cost of Learning: Efficiency vs. Efficacy of Learning-Based RRM for
6G [10.28841351455586]
Deep Reinforcement Learning (DRL) has become a valuable solution to automatically learn efficient resource management strategies in complex networks.
In many scenarios, the learning task is performed in the Cloud, while experience samples are generated directly by edge nodes or users.
This creates a friction between the need to speed up convergence towards an effective strategy, which requires the allocation of resources to transmit learning samples.
We propose a dynamic balancing strategy between the learning and data planes, which allows the centralized learning agent to quickly converge to an efficient resource allocation strategy.
arXiv Detail & Related papers (2022-11-30T11:26:01Z) - Federated Learning over Wireless IoT Networks with Optimized
Communication and Resources [98.18365881575805]
Federated learning (FL) as a paradigm of collaborative learning techniques has obtained increasing research attention.
It is of interest to investigate fast responding and accurate FL schemes over wireless systems.
We show that the proposed communication-efficient federated learning framework converges at a strong linear rate.
arXiv Detail & Related papers (2021-10-22T13:25:57Z) - Modelling resource allocation in uncertain system environment through
deep reinforcement learning [0.0]
Reinforcement Learning has applications in field of mechatronics, robotics, and other resource-constrained control system.
The paper identifies problem of resource allocation in uncertain environment could be effectively solved using Noisy Bagging duelling double deep Q network.
arXiv Detail & Related papers (2021-06-17T13:13:34Z) - Toward Multiple Federated Learning Services Resource Sharing in Mobile
Edge Networks [88.15736037284408]
We study a new model of multiple federated learning services at the multi-access edge computing server.
We propose a joint resource optimization and hyper-learning rate control problem, namely MS-FEDL.
Our simulation results demonstrate the convergence performance of our proposed algorithms.
arXiv Detail & Related papers (2020-11-25T01:29:41Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.