Scalable Reinforcement Learning for Virtual Machine Scheduling
- URL: http://arxiv.org/abs/2503.00537v1
- Date: Sat, 01 Mar 2025 15:33:52 GMT
- Title: Scalable Reinforcement Learning for Virtual Machine Scheduling
- Authors: Junjie Sheng, Jiehao Wu, Haochuan Cui, Yiqiu Hu, Wenli Zhou, Lei Zhu, Qian Peng, Wenhao Li, Xiangfeng Wang,
- Abstract summary: Cluster Value Decomposition Reinforcement Learning (CVD-RL)<n>This paper introduces a scalable RL framework, called Cluster Value Decomposition Reinforcement Learning (CVD-RL)
- Score: 21.22990796153464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in reinforcement learning (RL) have shown promise for optimizing virtual machine scheduling (VMS) in small-scale clusters. The utilization of RL to large-scale cloud computing scenarios remains notably constrained. This paper introduces a scalable RL framework, called Cluster Value Decomposition Reinforcement Learning (CVD-RL), to surmount the scalability hurdles inherent in large-scale VMS. The CVD-RL framework innovatively combines a decomposition operator with a look-ahead operator to adeptly manage representation complexities, while complemented by a Top-$k$ filter operator that refines exploration efficiency. Different from existing approaches limited to clusters of $10$ or fewer physical machines (PMs), CVD-RL extends its applicability to environments encompassing up to $50$ PMs. Furthermore, the CVD-RL framework demonstrates generalization capabilities that surpass contemporary SOTA methodologies across a variety of scenarios in empirical studies. This breakthrough not only showcases the framework's exceptional scalability and performance but also represents a significant leap in the application of RL for VMS within complex, large-scale cloud infrastructures. The code is available at https://anonymous.4open.science/r/marl4sche-D0FE.
Related papers
- Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models [11.688277445120567]
Vision-Language-Action models (VLA) have demonstrated remarkable capabilities and promising potential in solving complex robotic manipulation tasks.<n>Their substantial parameter sizes and high inference latency pose significant challenges for real-world deployment.<n>We propose RLRC, a three-stage recovery method for compressed VLAs.
arXiv Detail & Related papers (2025-06-21T08:45:32Z) - StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs)
StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks.
Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z) - Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme [36.34443944082215]
This work introduces a transparent, from-scratch framework forReinforcement learning (RL) in vision-based models (VLMs)
It offers a minimal yet functional four-step pipeline validated across multiple models and datasets.
In addition, a standardized evaluation scheme is proposed to assess training dynamics and reflective behaviors.
arXiv Detail & Related papers (2025-04-03T13:53:28Z) - Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)
We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.
We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A
Reinforcement Learning Approach [11.11570399751075]
This research presents a novel framework that harnesses the potential of Deep Reinforcement Learning (DRL)
The integration of our DRL agent with the RAY platform facilitates the creation of RLlib-IMPALA, a novel framework that efficiently uses RAY's resources to improve system adaptability and control.
arXiv Detail & Related papers (2024-02-24T23:25:35Z) - Provably Efficient CVaR RL in Low-rank MDPs [58.58570425202862]
We study risk-sensitive Reinforcement Learning (RL)
We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to balance interplay between exploration, exploitation, and representation learning in CVaR RL.
We prove that our algorithm achieves a sample complexity of $epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations.
arXiv Detail & Related papers (2023-11-20T17:44:40Z) - SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework.
We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training.
SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Deep Reinforcement Learning for Computational Fluid Dynamics on HPC
Systems [17.10464381844892]
Reinforcement learning (RL) is highly suitable for devising control strategies in the context of dynamical systems.
Recent research results indicate that RL-augmented computational fluid dynamics (CFD) solvers can exceed the current state of the art.
We present Relexi as a scalable RL framework that bridges the gap between machine learning and modern CFD solvers on HPC systems.
arXiv Detail & Related papers (2022-05-13T08:21:18Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.