Related papers: Sustainability of Data Center Digital Twins with Reinforcement Learning

Sustainability of Data Center Digital Twins with Reinforcement Learning

URL: http://arxiv.org/abs/2404.10786v1
Date: Tue, 16 Apr 2024 18:22:30 GMT
Title: Sustainability of Data Center Digital Twins with Reinforcement Learning
Authors: Soumyendu Sarkar, Avisek Naug, Antonio Guillen, Ricardo Luna, Vineet Gundecha, Ashwin Ramesh Babu, Sajad Mousavi,
Abstract summary: Machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential. DCRL-Green is a multi-agent RL environment that empowers the ML community to design data centers and research, develop, and refine RL controllers for carbon footprint reduction in DCs.
Score: 2.4971633082970377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid growth of machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential. However, the complexity of designing and controlling them in tandem presents a significant challenge. While some individual components like CFD-based design and Reinforcement Learning (RL) based HVAC control have been researched, there's a gap in the holistic design and optimization covering all elements simultaneously. To tackle this, we've developed DCRL-Green, a multi-agent RL environment that empowers the ML community to design data centers and research, develop, and refine RL controllers for carbon footprint reduction in DCs. It is a flexible, modular, scalable, and configurable platform that can handle large High Performance Computing (HPC) clusters. Furthermore, in its default setup, DCRL-Green provides a benchmark for evaluating single as well as multi-agent RL algorithms. It easily allows users to subclass the default implementations and design their own control approaches, encouraging community development for sustainable data centers. Open Source Link: https://github.com/HewlettPackard/dc-rl

Related papers

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs) StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks. Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z)
Hierarchical Multi-Agent Framework for Carbon-Efficient Liquid-Cooled Data Center Clusters [5.335496791443277]
This paper introduces Green-DCC, which proposes a Reinforcement Learning (RL) based hierarchical controller to optimize both workload and liquid cooling dynamically in a DCC. We demonstrate how the system optimize multiple data centers synchronously, enabling the scope of digital twins, and compare the performance of various RL approaches based on carbon emissions and sustainability metrics.
arXiv Detail & Related papers (2025-02-12T12:00:58Z)
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning [24.389896398264202]
We introduce $texttt$textbfEvoRL$$, the first end-to-end EvoRL framework optimized for GPU acceleration. The framework executes the entire training pipeline on accelerators, including environment simulations and EC processes.
arXiv Detail & Related papers (2025-01-25T08:31:07Z)
Communication-Control Codesign for Large-Scale Wireless Networked Control Systems [80.30532872347668]
Wireless Networked Control Systems (WNCSs) are essential to Industry 4.0, enabling flexible control in applications, such as drone swarms and autonomous robots. We propose a practical WNCS model that captures correlated dynamics among multiple control loops with spatially distributed sensors and actuators sharing limited wireless resources over multi-state Markov block-fading channels. We develop a Deep Reinforcement Learning (DRL) algorithm that efficiently handles the hybrid action space, captures communication-control correlations, and ensures robust training despite sparse cross-domain variables and floating control inputs.
arXiv Detail & Related papers (2024-10-15T06:28:21Z)
SustainDC: Benchmarking for Sustainable Data Center Control [4.159959816797259]
We introduce SustainDC, a set of Python environments for benchmarking multi-agent reinforcement learning (MARL) algorithms for data centers (DC) SustainDC supports custom DC configurations and tasks such as workload scheduling, cooling optimization, and auxiliary battery management. We evaluate various MARL algorithms on SustainDC, showing their performance across diverse DC designs, locations, weather conditions, grid carbon intensity, and workload requirements.
arXiv Detail & Related papers (2024-08-14T22:43:52Z)
An experimental evaluation of Deep Reinforcement Learning algorithms for HVAC control [40.71019623757305]
Recent studies have shown that Deep Reinforcement Learning (DRL) algorithms can outperform traditional reactive controllers. This paper provides a critical and reproducible evaluation of several state-of-the-art DRL algorithms for HVAC control.
arXiv Detail & Related papers (2024-01-11T08:40:26Z)
PyDCM: Custom Data Center Models with Reinforcement Learning for Sustainability [2.6429542504022314]
PyDCM is a customizable Data Center Model implemented in Python. The use of vectorized thermal calculations makes PyDCM orders of magnitude faster (30 times) than current Energy Plus modeling implementations.
arXiv Detail & Related papers (2023-10-05T21:24:54Z)
Low Emission Building Control with Zero-Shot Reinforcement Learning [70.70479436076238]
Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency. We show it is possible to obtain emission-reducing policies without a priori--a paradigm we call zero-shot building control.
arXiv Detail & Related papers (2022-08-12T17:13:25Z)
Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs [64.26714148634228]
congestion control (CC) algorithms become extremely difficult to design. It is currently not possible to deploy AI models on network devices due to their limited computational capabilities. We build a computationally-light solution based on a recent reinforcement learning CC algorithm.
arXiv Detail & Related papers (2022-07-05T20:42:24Z)
Deep Reinforcement Learning for Computational Fluid Dynamics on HPC Systems [17.10464381844892]
Reinforcement learning (RL) is highly suitable for devising control strategies in the context of dynamical systems. Recent research results indicate that RL-augmented computational fluid dynamics (CFD) solvers can exceed the current state of the art. We present Relexi as a scalable RL framework that bridges the gap between machine learning and modern CFD solvers on HPC systems.
arXiv Detail & Related papers (2022-05-13T08:21:18Z)
ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning [141.58588761593955]
We present a library ElegantRL-podracer for cloud-native deep reinforcement learning. It efficiently supports millions of cores to carry out massively parallel training at multiple levels. At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU cores in a single GPU.
arXiv Detail & Related papers (2021-12-11T06:31:21Z)
Power Modeling for Effective Datacenter Planning and Compute Management [53.41102502425513]
We discuss two classes of statistical power models designed and validated to be accurate, simple, interpretable and applicable to all hardware configurations and workloads. We demonstrate that the proposed statistical modeling techniques, while simple and scalable, predict power with less than 5% Mean Absolute Percent Error (MAPE) for more than 95% diverse Power Distribution Units (more than 2000) using only 4 features.
arXiv Detail & Related papers (2021-03-22T21:22:51Z)
A Framework for Energy and Carbon Footprint Analysis of Distributed and Federated Edge Learning [48.63610479916003]
This article breaks down and analyzes the main factors that influence the environmental footprint of distributed learning policies. It models both vanilla and decentralized FL policies driven by consensus. Results show that FL allows remarkable end-to-end energy savings (30%-40%) for wireless systems characterized by low bit/Joule efficiency.
arXiv Detail & Related papers (2021-03-18T16:04:42Z)
Integrating Distributed Architectures in Highly Modular RL Libraries [4.297070083645049]
Most popular reinforcement learning libraries advocate for highly modular agent composability. We propose a versatile approach that allows the definition of RL agents at different scales through independent reusable components.
arXiv Detail & Related papers (2020-07-06T10:22:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.