Related papers: Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters

Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters

URL: http://arxiv.org/abs/2508.10253v1
Date: Thu, 14 Aug 2025 00:43:20 GMT
Title: Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters
Authors: Guanzi Yao, Heyao Liu, Linyan Dai,
Abstract summary: This paper addresses the challenges of high resource dynamism and scheduling complexity in cloud-native database systems.<n>It proposes an adaptive resource orchestration method based on multi-agent reinforcement learning.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the challenges of high resource dynamism and scheduling complexity in cloud-native database systems. It proposes an adaptive resource orchestration method based on multi-agent reinforcement learning. The method introduces a heterogeneous role-based agent modeling mechanism. This allows different resource entities, such as compute nodes, storage nodes, and schedulers, to adopt distinct policy representations. These agents are better able to reflect diverse functional responsibilities and local environmental characteristics within the system. A reward-shaping mechanism is designed to integrate local observations with global feedback. This helps mitigate policy learning bias caused by incomplete state observations. By combining real-time local performance signals with global system value estimation, the mechanism improves coordination among agents and enhances policy convergence stability. A unified multi-agent training framework is developed and evaluated on a representative production scheduling dataset. Experimental results show that the proposed method outperforms traditional approaches across multiple key metrics. These include resource utilization, scheduling latency, policy convergence speed, system stability, and fairness. The results demonstrate strong generalization and practical utility. Across various experimental scenarios, the method proves effective in handling orchestration tasks with high concurrency, high-dimensional state spaces, and complex dependency relationships. This confirms its advantages in real-world, large-scale scheduling environments.

Related papers

Adaptive Dual-Weighting Framework for Federated Learning via Out-of-Distribution Detection [53.45696787935487]
Federated Learning (FL) enables collaborative model training across large-scale distributed service nodes.<n>In real-world service-oriented deployments, data generated by heterogeneous users, devices, and application scenarios are inherently non-IID.<n>We propose FLood, a novel FL framework inspired by out-of-distribution (OOD) detection.
arXiv Detail & Related papers (2026-02-01T05:54:59Z)
AI-Driven Cloud Resource Optimization for Multi-Cluster Environments [0.0]
This paper presents an AI-driven framework for adaptive resource optimization in multi-cluster cloud systems.<n>The proposed approach integrates predictive learning, policy-aware decision-making, and continuous feedback.<n>A prototype implementation demonstrates improved resource efficiency, faster stabilization during workload fluctuations, and reduced performance variability.
arXiv Detail & Related papers (2025-12-31T15:15:46Z)
Shared Representation Learning for High-Dimensional Multi-Task Forecasting under Resource Contention in Cloud-Native Backends [12.826922983028082]
This study proposes a unified forecasting framework for high-dimensional multi-task time series to meet the prediction demands of cloud native backend systems.<n>The method builds a shared encoding structure to represent diverse monitoring indicators in a unified manner.<n>A cross-task structural propagation module is introduced to model potential dependencies among nodes.
arXiv Detail & Related papers (2025-12-24T11:02:03Z)
Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
Federated Anomaly Detection for Multi-Tenant Cloud Platforms with Personalized Modeling [6.028943403943345]
This paper proposes an anomaly detection method based on federated learning to address key challenges in multi-tenant cloud environments.<n>A global model is optimized, enabling cross-tenant collaborative anomaly detection while preserving data privacy.<n>Experiments use real telemetry data from a cloud platform to construct a simulated multi-tenant environment.
arXiv Detail & Related papers (2025-08-14T00:46:24Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Intelligent Task Scheduling for Microservices via A3C-Based Reinforcement Learning [4.422684054800804]
This paper proposes an adaptive resource scheduling method based on the A3C reinforcement learning algorithm.<n>The method incorporates an asynchronous multi-threaded learning mechanism, allowing multiple agents to perform parallel sampling and synchronize updates to the global network parameters.<n>The results show that the proposed method delivers high scheduling performance and system stability in multi-task concurrent environments.
arXiv Detail & Related papers (2025-05-01T04:42:48Z)
Distributionally Robust Model-based Reinforcement Learning with Large State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z)
Deep Reinforcement Learning for Distributed and Uncoordinated Cognitive Radios Resource Allocation [1.218340575383456]
This paper presents a novel deep reinforcement learning-based resource allocation technique for the multi-agent environment presented by a cognitive radio network. The presented algorithm converges in an arbitrarily long time to equilibrium policies in the non-stationary environment. It is shown that the use of a standard single-agent deep reinforcement learning approach may not achieve convergence when used in an uncoordinated interacting multi-radio scenario.
arXiv Detail & Related papers (2022-05-27T12:43:30Z)
Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication [20.891460617583302]
The paper considers independent reinforcement learning (IRL) for collaborative decision-making in the paradigm of federated learning (FL) FL generates excessive communication overheads between agents and a remote central server. This paper proposes two advanced optimization schemes to improve the system's utility value.
arXiv Detail & Related papers (2021-03-24T07:21:43Z)
Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.