Learning Virtual Machine Scheduling in Cloud Computing through Language Agents
- URL: http://arxiv.org/abs/2505.10117v2
- Date: Mon, 19 May 2025 06:23:03 GMT
- Title: Learning Virtual Machine Scheduling in Cloud Computing through Language Agents
- Authors: JieHao Wu, Ziwei Wang, Junjie Sheng, Wenhao Li, Xiangfeng Wang, Jun Luo,
- Abstract summary: In cloud services, virtual machine (VM) scheduling is a typical Online Dynamic Multidimensional Bin Packing (ODMBP) problem.<n>Traditional methods struggle to adapt to real-time changes, domain-expert-designed approaches suffer from rigid strategies.<n>This paper proposes a hierarchical language agent framework named MiCo, which provides a large language model (LLM)-driven design paradigm for solving ODMBP.
- Score: 22.314607581353638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In cloud services, virtual machine (VM) scheduling is a typical Online Dynamic Multidimensional Bin Packing (ODMBP) problem, characterized by large-scale complexity and fluctuating demands. Traditional optimization methods struggle to adapt to real-time changes, domain-expert-designed heuristic approaches suffer from rigid strategies, and existing learning-based methods often lack generalizability and interpretability. To address these limitations, this paper proposes a hierarchical language agent framework named MiCo, which provides a large language model (LLM)-driven heuristic design paradigm for solving ODMBP. Specifically, ODMBP is formulated as a Semi-Markov Decision Process with Options (SMDP-Option), enabling dynamic scheduling through a two-stage architecture, i.e., Option Miner and Option Composer. Option Miner utilizes LLMs to discover diverse and useful non-context-aware strategies by interacting with constructed environments. Option Composer employs LLMs to discover a composing strategy that integrates the non-context-aware strategies with the contextual ones. Extensive experiments on real-world enterprise datasets demonstrate that MiCo achieves a 96.9\% competitive ratio in large-scale scenarios involving more than 10,000 virtual machines. It maintains high performance even under nonstationary request flows and diverse configurations, thus validating its effectiveness in complex and large-scale cloud environments.
Related papers
- MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z) - A Production Scheduling Framework for Reinforcement Learning Under Real-World Constraints [0.0]
Real-world production environments introduce additional complexities that cause traditional scheduling approaches to be less effective.<n>Reinforcement learning (RL) holds potential in addressing these challenges, as it allows agents to learn adaptive scheduling strategies.<n>We propose a modular framework that extends classical JSSP formulations by incorporating key real-world constraints.<n>JobShopLab is an open-source tool for both research and industrial applications.
arXiv Detail & Related papers (2025-06-16T14:50:26Z) - LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling [52.1366057696919]
LOP is an efficient neural pruning framework that learns optimal pruning strategies from the target pruning constraint.<n>LOP approach trains autoregressive neural networks (NNs) to directly predict layer-wise pruning strategies adaptive to the target pruning constraint.<n> Experimental results show that LOP outperforms state-of-the-art pruning methods in various metrics while achieving up to three orders of magnitude speedup.
arXiv Detail & Related papers (2025-06-15T12:14:16Z) - Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling [6.623504719591386]
Large Language Model (LLM)-based scheduler uses ReAct-style framework (Reason + Act)<n>System incorporates a scratchpad memory to track scheduling history and refine decisions via natural language feedback.<n>We evaluate our approach using OpenAI's O4-Mini and Anthropic's Claude 3.7 across seven real-world HPC workload scenarios.
arXiv Detail & Related papers (2025-05-29T14:25:29Z) - Efficient Multi-modal Long Context Learning for Training-free Adaptation [96.21248144937627]
This paper introduces Efficient Multi-Modal Long Context Learning (EMLoC)<n>It embeds demonstration examples directly into the model input.<n>It condenses long-context multimodal inputs into compact, task-specific memory representations.
arXiv Detail & Related papers (2025-05-26T10:49:44Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - Symmetry-Preserving Architecture for Multi-NUMA Environments (SPANE): A Deep Reinforcement Learning Approach for Dynamic VM Scheduling [28.72083501050024]
We introduce the Dynamic VM Allocation problem in Multi-NUMA PM (DVAMP)<n>We propose SPANE, a novel deep reinforcement learning approach that exploits the problem's inherent symmetries.<n>Experiments conducted on the Huawei-East-1 dataset demonstrate that SPANE outperforms existing baselines, reducing average VM wait time by 45%.
arXiv Detail & Related papers (2025-04-21T08:09:40Z) - Understanding and Optimizing Multi-Stage AI Inference Pipelines [11.254219071373319]
HERMES is a Heterogeneous Multi-stage LLM inference Execution Simulator.<n> HERMES supports heterogeneous clients executing multiple models concurrently unlike prior frameworks.<n>We explore the impact of reasoning stages on end-to-end latency, optimal strategies for hybrid pipelines, and the architectural implications of remote KV cache retrieval.
arXiv Detail & Related papers (2025-04-14T00:29:49Z) - The Compressor-Retriever Architecture for Language Model OS [20.56093501980724]
This paper explores the concept of using a language model as the core component of an operating system (OS)
A key challenge in realizing such an LM OS is managing the life-long context and ensuring statefulness across sessions.
We introduce compressor-retriever, a model-agnostic architecture designed for life-long context management.
arXiv Detail & Related papers (2024-09-02T23:28:15Z) - On the Multi-turn Instruction Following for Conversational Web Agents [83.51251174629084]
We introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment.
We propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques.
arXiv Detail & Related papers (2024-02-23T02:18:12Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training [42.514897110537596]
Modern Deep Learning (DL) models have grown to sizes requiring massive clusters of specialized, high-end nodes to train.
designing such clusters to maximize both performance and utilization--to amortize their steep cost--is a challenging task.
We introduce COMET, a holistic cluster design methodology and workflow to jointly study the impact of parallelization strategies and key cluster resource provisioning on the performance of distributed DL training.
arXiv Detail & Related papers (2022-11-30T00:32:37Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.