Related papers: Symmetry-Preserving Architecture for Multi-NUMA Environments (SPANE): A Deep Reinforcement Learning Approach for Dynamic VM Scheduling

Symmetry-Preserving Architecture for Multi-NUMA Environments (SPANE): A Deep Reinforcement Learning Approach for Dynamic VM Scheduling

URL: http://arxiv.org/abs/2504.14946v1
Date: Mon, 21 Apr 2025 08:09:40 GMT
Title: Symmetry-Preserving Architecture for Multi-NUMA Environments (SPANE): A Deep Reinforcement Learning Approach for Dynamic VM Scheduling
Authors: Tin Ping Chan, Yunlong Cheng, Yizhan Zhu, Xiaofeng Gao, Guihai Chen,
Abstract summary: We introduce the Dynamic VM Allocation problem in Multi-NUMA PM (DVAMP)<n>We propose SPANE, a novel deep reinforcement learning approach that exploits the problem's inherent symmetries.<n>Experiments conducted on the Huawei-East-1 dataset demonstrate that SPANE outperforms existing baselines, reducing average VM wait time by 45%.
Score: 28.72083501050024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As cloud computing continues to evolve, the adoption of multi-NUMA (Non-Uniform Memory Access) architecture by cloud service providers has introduced new challenges in virtual machine (VM) scheduling. To address these challenges and more accurately reflect the complexities faced by modern cloud environments, we introduce the Dynamic VM Allocation problem in Multi-NUMA PM (DVAMP). We formally define both offline and online versions of DVAMP as mixed-integer linear programming problems, providing a rigorous mathematical foundation for analysis. A tight performance bound for greedy online algorithms is derived, offering insights into the worst-case optimality gap as a function of the number of physical machines and VM lifetime variability. To address the challenges posed by DVAMP, we propose SPANE (Symmetry-Preserving Architecture for Multi-NUMA Environments), a novel deep reinforcement learning approach that exploits the problem's inherent symmetries. SPANE produces invariant results under arbitrary permutations of physical machine states, enhancing learning efficiency and solution quality. Extensive experiments conducted on the Huawei-East-1 dataset demonstrate that SPANE outperforms existing baselines, reducing average VM wait time by 45%. Our work contributes to the field of cloud resource management by providing both theoretical insights and practical solutions for VM scheduling in multi-NUMA environments, addressing a critical gap in the literature and offering improved performance for real-world cloud systems.

Related papers

Learning Virtual Machine Scheduling in Cloud Computing through Language Agents [22.314607581353638]
In cloud services, virtual machine (VM) scheduling is a typical Online Dynamic Multidimensional Bin Packing (ODMBP) problem.<n>Traditional methods struggle to adapt to real-time changes, domain-expert-designed approaches suffer from rigid strategies.<n>This paper proposes a hierarchical language agent framework named MiCo, which provides a large language model (LLM)-driven design paradigm for solving ODMBP.
arXiv Detail & Related papers (2025-05-15T09:42:11Z)
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Task Delay and Energy Consumption Minimization for Low-altitude MEC via Evolutionary Multi-objective Deep Reinforcement Learning [52.64813150003228]
The low-altitude economy (LAE), driven by unmanned aerial vehicles (UAVs) and other aircraft, has revolutionized fields such as transportation, agriculture, and environmental monitoring.<n>In the upcoming six-generation (6G) era, UAV-assisted mobile edge computing (MEC) is particularly crucial in challenging environments such as mountainous or disaster-stricken areas.<n>The task offloading problem is one of the key issues in UAV-assisted MEC, primarily addressing the trade-off between minimizing the task delay and the energy consumption of the UAV.
arXiv Detail & Related papers (2025-01-11T02:32:42Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
HRVMamba: High-Resolution Visual State Space Model for Dense Prediction [60.80423207808076]
State Space Models (SSMs) with efficient hardware-aware designs have demonstrated significant potential in computer vision tasks. These models have been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation. We introduce the Dynamic Visual State Space (DVSS) block, which employs deformable convolution to mitigate the long-range forgetting problem. We also introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process.
arXiv Detail & Related papers (2024-10-04T06:19:29Z)
Cost-Aware Dynamic Cloud Workflow Scheduling using Self-Attention and Evolutionary Reinforcement Learning [7.653021685451039]
This paper proposes a novel self-attention policy network for cloud workflow scheduling.<n>The trained SPN-CWS can effectively process all candidate instances simultaneously to identify the most suitable VM instance to execute every workflow task.<n>Our method can noticeably outperform several state-of-the-art algorithms on multiple benchmark CDMWS problems.
arXiv Detail & Related papers (2024-09-27T04:45:06Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Self-Sustaining Multiple Access with Continual Deep Reinforcement Learning for Dynamic Metaverse Applications [17.436875530809946]
The Metaverse is a new paradigm that aims to create a virtual environment consisting of numerous worlds, each of which will offer a different set of services. To deal with such a dynamic and complex scenario, one potential approach is to adopt self-sustaining strategies. This paper investigates the problem of multiple access in multi-channel environments to maximize the throughput of the intelligent agent.
arXiv Detail & Related papers (2023-09-18T22:02:47Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Smoothed Online Learning for Prediction in Piecewise Affine Systems [43.64498536409903]
This paper builds on the recently developed smoothed online learning framework. It provides the first algorithms for prediction and simulation in piecewise affine systems.
arXiv Detail & Related papers (2023-01-26T15:54:14Z)
VMAgent: Scheduling Simulator for Reinforcement Learning [44.026076801936874]
A novel simulator called VMAgent is introduced to help RL researchers better explore new methods. VMAgent is inspired by practical virtual machine (VM) scheduling tasks. From the VM scheduling perspective, VMAgent also helps to explore better learning-based scheduling solutions.
arXiv Detail & Related papers (2021-12-09T09:18:38Z)
Feeling of Presence Maximization: mmWave-Enabled Virtual Reality Meets Deep Reinforcement Learning [76.46530937296066]
This paper investigates the problem of providing ultra-reliable and energy-efficient virtual reality (VR) experiences for wireless mobile users. To ensure reliable ultra-high-definition (UHD) video frame delivery to mobile users, a coordinated multipoint (CoMP) transmission technique and millimeter wave (mmWave) communications are exploited.
arXiv Detail & Related papers (2021-06-03T08:35:10Z)
Multi-factorial Optimization for Large-scale Virtual Machine Placement in Cloud Computing [15.840835256016259]
Evolutionary algorithms (EAs) have been performed promising-solving on virtual machine placement (VMP) problems in the past. As growing demand for cloud services, the existing EAs fail to implement in large-scale virtual machine placement problem. This paper aims to apply the MFO technology to the LVMP problem in heterogeneous environment.
arXiv Detail & Related papers (2020-01-18T02:59:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.