DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations
- URL: http://arxiv.org/abs/2602.02137v2
- Date: Tue, 03 Feb 2026 08:43:37 GMT
- Title: DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations
- Authors: Minghao Li, Ruihang Wang, Rui Tan, Yonggang Wen,
- Abstract summary: DCoPilot is a hybrid framework for generative control policies in dynamic DC operation.<n>It operates through three coordinated phases: (i) simulation scale-up, which stress-tests reward candidates across diverse simulation-ready scenes; (ii) meta policy distillation, where a hypernetwork is trained to output policy weights conditioned on SLA and scene embeddings; and (iii) online adaptation, enabling zero-shot policy generation in response to updated specifications.
- Score: 9.210347753567092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern data centers (DCs) hosting artificial intelligence (AI)-dedicated devices operate at high power densities with rapidly varying workloads, making minute-level adaptation essential for safe and energy-efficient operation. However, manually designing piecewise deep reinforcement learning (DRL) agents cannot keep pace with frequent dynamics shifts and service-level agreement (SLA) changes of an evolving DC. This specification-to-policy lag causes a lack of timely, effective control policies, which may lead to service outages. To bridge the gap, we present DCoPilot, a hybrid framework for generative control policies in dynamic DC operation. DCoPilot synergizes two distinct generative paradigms, i.e., a large language model (LLM) that performs symbolic generation of structured reward forms, and a hypernetwork that conducts parametric generation of policy weights. DCoPilot operates through three coordinated phases: (i) simulation scale-up, which stress-tests reward candidates across diverse simulation-ready (SimReady) scenes; (ii) meta policy distillation, where a hypernetwork is trained to output policy weights conditioned on SLA and scene embeddings; and (iii) online adaptation, enabling zero-shot policy generation in response to updated specifications. Evaluated across five control task families spanning diverse DC components, DCoPilot achieves near-zero constraint violations and outperforms all baselines across specification variations. Ablation studies validate the effectiveness of LLM-based unified reward generation in enabling stable hypernetwork convergence.
Related papers
- Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy [52.106797722292896]
We present DCDP, a Dynamic Closed-Loop Diffusion Policy framework that integrates chunk-based action generation with real-time correction.<n>In dynamic PushT simulations, DCDP improves adaptability by 19% without retraining while requiring only 5% additional computation.
arXiv Detail & Related papers (2026-03-02T15:04:18Z) - Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning [2.034091340570242]
This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture.<n>Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes.
arXiv Detail & Related papers (2025-11-14T15:29:46Z) - Plasma Shape Control via Zero-shot Generative Reinforcement Learning [17.3934551430283]
We develop a novel framework for developing a versatile, zero-shot control policy from a large-scale offline dataset of PID-controlled discharges.<n>The resulting foundation policy can be deployed for diverse trajectory tracking tasks in a zero-shot manner without any task-specific fine-tuning.
arXiv Detail & Related papers (2025-10-20T13:34:51Z) - LLM-Empowered Agentic MAC Protocols: A Dynamic Stackelberg Game Approach [13.272022414257224]
We introduce a game-theoretic LLM-empowered multi-agent DRL (MARL) framework.<n>The uplink transmission between a base station and a varying number of user equipments is modeled as a dynamic multi-follower Stackelberg game (MFSG)<n>Within this game, LLM-driven agents, coordinated through proximal policy optimization (PPO), synthesize adaptive, semantic MAC protocols.
arXiv Detail & Related papers (2025-10-13T01:47:24Z) - Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition [52.232968183793986]
General Policy Composition (GPC) is a training-free method that enhances performance by combining the distributional scores of multiple pre-trained policies.<n>GPC consistently improves performance and adaptability across a diverse set of tasks.
arXiv Detail & Related papers (2025-10-01T16:05:53Z) - Large Language Model-Empowered Decision Transformer for UAV-Enabled Data Collection [71.84636717632206]
Unmanned aerial vehicles (UAVs) for reliable and energy-efficient data collection from spatially distributed devices holds great promise in supporting Internet of Things (IoT) applications.<n>We propose a joint language model (LLM) to learn effective UAV control policies.<n>LLM-CRDT outperforms benchmark online and offline methods, achieving up to 36.7% higher energy efficiency than current state-of-the-art DT approaches.
arXiv Detail & Related papers (2025-09-17T13:05:08Z) - PowerGrow: Feasible Co-Growth of Structures and Dynamics for Power Grid Synthesis [75.14189839277928]
We present PowerGrow, a co-generative framework that significantly reduces computational overhead while maintaining operational validity.<n> Experiments across benchmark settings show that PowerGrow outperforms prior diffusion models in fidelity and diversity.<n>This demonstrates its ability to generate operationally valid and realistic power grid scenarios.
arXiv Detail & Related papers (2025-08-29T01:47:27Z) - Grid-Agent: An LLM-Powered Multi-Agent System for Power Grid Control [4.3210078529580045]
This paper introduces Grid-Agent, an autonomous AI-driven framework to detect and remediate grid violations.<n>Grid-Agent integrates semantic reasoning with numerical precision through modular agents.<n>Experiments on IEEE and CIGRE benchmark networks demonstrate superior mitigation performance.
arXiv Detail & Related papers (2025-08-07T01:10:28Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Distributed-Training-and-Execution Multi-Agent Reinforcement Learning
for Power Control in HetNet [48.96004919910818]
We propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet.
To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems.
In this way, an agent's policy can be learned by other agents more easily, resulting in a more efficient collaboration process.
arXiv Detail & Related papers (2022-12-15T17:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.