Sustainability of Data Center Digital Twins with Reinforcement Learning
- URL: http://arxiv.org/abs/2404.10786v1
- Date: Tue, 16 Apr 2024 18:22:30 GMT
- Title: Sustainability of Data Center Digital Twins with Reinforcement Learning
- Authors: Soumyendu Sarkar, Avisek Naug, Antonio Guillen, Ricardo Luna, Vineet Gundecha, Ashwin Ramesh Babu, Sajad Mousavi,
- Abstract summary: Machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption.
To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential.
DCRL-Green is a multi-agent RL environment that empowers the ML community to design data centers and research, develop, and refine RL controllers for carbon footprint reduction in DCs.
- Score: 2.4971633082970377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid growth of machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential. However, the complexity of designing and controlling them in tandem presents a significant challenge. While some individual components like CFD-based design and Reinforcement Learning (RL) based HVAC control have been researched, there's a gap in the holistic design and optimization covering all elements simultaneously. To tackle this, we've developed DCRL-Green, a multi-agent RL environment that empowers the ML community to design data centers and research, develop, and refine RL controllers for carbon footprint reduction in DCs. It is a flexible, modular, scalable, and configurable platform that can handle large High Performance Computing (HPC) clusters. Furthermore, in its default setup, DCRL-Green provides a benchmark for evaluating single as well as multi-agent RL algorithms. It easily allows users to subclass the default implementations and design their own control approaches, encouraging community development for sustainable data centers. Open Source Link: https://github.com/HewlettPackard/dc-rl
Related papers
- A Configurable Pythonic Data Center Model for Sustainable Cooling and ML Integration [4.0196072781228285]
We showcase PyDCM, a Python library that enables extremely fast prototyping of data center design.
We demonstrate capabilities of PyDCM and compare them to existing works in EnergyPlus for modeling data centers.
PyDCM can also be used as a standalone Gymnasium environment for demonstrating sustainability-focused data center control.
arXiv Detail & Related papers (2024-04-18T20:25:33Z) - An experimental evaluation of Deep Reinforcement Learning algorithms for HVAC control [40.71019623757305]
Recent studies have shown that Deep Reinforcement Learning (DRL) algorithms can outperform traditional reactive controllers.
This paper provides a critical and reproducible evaluation of several state-of-the-art DRL algorithms for HVAC control.
arXiv Detail & Related papers (2024-01-11T08:40:26Z) - PyDCM: Custom Data Center Models with Reinforcement Learning for Sustainability [2.6429542504022314]
PyDCM is a customizable Data Center Model implemented in Python.
The use of vectorized thermal calculations makes PyDCM orders of magnitude faster (30 times) than current Energy Plus modeling implementations.
arXiv Detail & Related papers (2023-10-05T21:24:54Z) - Low Emission Building Control with Zero-Shot Reinforcement Learning [70.70479436076238]
Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency.
We show it is possible to obtain emission-reducing policies without a priori--a paradigm we call zero-shot building control.
arXiv Detail & Related papers (2022-08-12T17:13:25Z) - Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs [64.26714148634228]
congestion control (CC) algorithms become extremely difficult to design.
It is currently not possible to deploy AI models on network devices due to their limited computational capabilities.
We build a computationally-light solution based on a recent reinforcement learning CC algorithm.
arXiv Detail & Related papers (2022-07-05T20:42:24Z) - Deep Reinforcement Learning for Computational Fluid Dynamics on HPC
Systems [17.10464381844892]
Reinforcement learning (RL) is highly suitable for devising control strategies in the context of dynamical systems.
Recent research results indicate that RL-augmented computational fluid dynamics (CFD) solvers can exceed the current state of the art.
We present Relexi as a scalable RL framework that bridges the gap between machine learning and modern CFD solvers on HPC systems.
arXiv Detail & Related papers (2022-05-13T08:21:18Z) - ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep
Reinforcement Learning [141.58588761593955]
We present a library ElegantRL-podracer for cloud-native deep reinforcement learning.
It efficiently supports millions of cores to carry out massively parallel training at multiple levels.
At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU cores in a single GPU.
arXiv Detail & Related papers (2021-12-11T06:31:21Z) - Power Modeling for Effective Datacenter Planning and Compute Management [53.41102502425513]
We discuss two classes of statistical power models designed and validated to be accurate, simple, interpretable and applicable to all hardware configurations and workloads.
We demonstrate that the proposed statistical modeling techniques, while simple and scalable, predict power with less than 5% Mean Absolute Percent Error (MAPE) for more than 95% diverse Power Distribution Units (more than 2000) using only 4 features.
arXiv Detail & Related papers (2021-03-22T21:22:51Z) - A Framework for Energy and Carbon Footprint Analysis of Distributed and
Federated Edge Learning [48.63610479916003]
This article breaks down and analyzes the main factors that influence the environmental footprint of distributed learning policies.
It models both vanilla and decentralized FL policies driven by consensus.
Results show that FL allows remarkable end-to-end energy savings (30%-40%) for wireless systems characterized by low bit/Joule efficiency.
arXiv Detail & Related papers (2021-03-18T16:04:42Z) - Integrating Distributed Architectures in Highly Modular RL Libraries [4.297070083645049]
Most popular reinforcement learning libraries advocate for highly modular agent composability.
We propose a versatile approach that allows the definition of RL agents at different scales through independent reusable components.
arXiv Detail & Related papers (2020-07-06T10:22:07Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.