A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale
- URL: http://arxiv.org/abs/2410.05133v1
- Date: Mon, 7 Oct 2024 15:36:50 GMT
- Title: A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale
- Authors: Wesley Brewer, Matthias Maiterth, Vineet Kumar, Rafal Wojda, Sedrick Bouknight, Jesse Hines, Woong Shin, Scott Greenwood, David Grant, Wesley Williams, Feiyi Wang,
- Abstract summary: We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers.
It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient thermo-fluidic cooling model, and (3) an augmented reality model of the supercomputer and central energy plant.
We envision the digital twin will be a key enabler for sustainable, energy-efficient supercomputing.
- Score: 0.6465720702999465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present ExaDigiT, an open-source framework for developing comprehensive digital twins of liquid-cooled supercomputers. It integrates three main modules: (1) a resource allocator and power simulator, (2) a transient thermo-fluidic cooling model, and (3) an augmented reality model of the supercomputer and central energy plant. The framework enables the study of "what-if" scenarios, system optimizations, and virtual prototyping of future systems. Using Frontier as a case study, we demonstrate the framework's capabilities by replaying six months of system telemetry for systematic verification and validation. Such a comprehensive analysis of a liquid-cooled exascale supercomputer is the first of its kind. ExaDigiT elucidates complex transient cooling system dynamics, runs synthetic or real workloads, and predicts energy losses due to rectification and voltage conversion. Throughout our paper, we present lessons learned to benefit HPC practitioners developing similar digital twins. We envision the digital twin will be a key enabler for sustainable, energy-efficient supercomputing.
Related papers
- Transforming Future Data Center Operations and Management via Physical AI [24.063748316223343]
Data centers as mission-critical infrastructures are pivotal in powering the growth of artificial intelligence (AI) and the digital economy.
The evolution from Internet DC to AI DC has introduced new challenges in operating and managing data centers for improved business resilience and reduced total cost of ownership.
We propose and develop a novel Physical AI (PhyAI) framework for advancing DC operations and management.
arXiv Detail & Related papers (2025-04-07T12:09:22Z) - DMWM: Dual-Mind World Model with Long-Term Imagination [53.98633183204453]
We propose a novel dual-mind world model (DMWM) framework that integrates logical reasoning to enable imagination with logical consistency.
The proposed framework is evaluated on benchmark tasks that require long-term planning from the DMControl suite.
arXiv Detail & Related papers (2025-02-11T14:40:57Z) - Predictive Digital Twin for Condition Monitoring Using Thermal Imaging [0.0]
This paper explores the development and practical application of a predictive digital twin specifically designed for condition monitoring.
We employ advanced mathematical models and thermal imaging techniques to establish a robust digital twin framework.
We introduce the use of a human-machine interface that includes virtual reality, enhancing user interaction and system understanding.
arXiv Detail & Related papers (2024-11-08T11:23:57Z) - Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver [32.092112495189156]
Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models.
Recent advances in machine learning provide data-driven methods for developing digital twins.
We introduce a memristive neural ordinary differential equation solver for digital twins.
arXiv Detail & Related papers (2024-06-12T15:50:35Z) - Hyper-Transformer for Amodal Completion [82.4118011026855]
Amodal object completion is a complex task that involves predicting the invisible parts of an object based on visible segments and background information.
We introduce a novel framework named the Hyper-Transformer Amodal Network (H-TAN)
This framework utilizes a hyper transformer equipped with a dynamic convolution head to directly learn shape priors and accurately predict amodal masks.
arXiv Detail & Related papers (2024-05-30T11:11:54Z) - From Digital Twins to Digital Twin Prototypes: Concepts, Formalization,
and Applications [55.57032418885258]
There is no consensual definition of what a digital twin is.
Our digital twin prototype (DTP) approach supports engineers during the development and automated testing of embedded software systems.
arXiv Detail & Related papers (2024-01-15T22:13:48Z) - Enabling Automated Integration Testing of Smart Farming Applications via
Digital Twin Prototypes [49.44419860570116]
Industry 4.0 and smart farming are closely related, as many of the technologies used in smart farming are also used in Industry 4.0.
Digital twins have the potential for cost-effective software development of such applications.
We present a case study for employing our Digital Twin Prototype approach to automated testing of software.
arXiv Detail & Related papers (2023-11-09T21:24:12Z) - Digital Twin and Artificial Intelligence Incorporated With Surrogate
Modeling for Hybrid and Sustainable Energy Systems [0.3969046654861533]
Surrogate modeling has brought about a revolution in computation in the branches of science and engineering.
Backed by Artificial Intelligence, a surrogate model can present highly accurate results with a significant reduction in computation time.
One of the promising technologies for assessing applicability for the energy system is the digital twin.
arXiv Detail & Related papers (2022-09-30T20:14:16Z) - Decentralized digital twins of complex dynamical systems [0.0]
We introduce a decentralized twin (DDT) framework for dynamical systems.
We discuss the prospects of the DDT paradigm in computational science and engineering applications.
arXiv Detail & Related papers (2022-07-07T19:44:42Z) - Delayed Propagation Transformer: A Universal Computation Engine towards
Practical Control in Cyber-Physical Systems [68.75717332928205]
Multi-agent control is a central theme in the Cyber-Physical Systems.
This paper presents a new transformer-based model that specializes in the global modeling of CPS.
With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems.
arXiv Detail & Related papers (2021-10-29T17:20:53Z) - Automatic digital twin data model generation of building energy systems
from piping and instrumentation diagrams [58.720142291102135]
We present an approach to recognize symbols and connections of P&ID from buildings in a completely automated way.
We apply algorithms for symbol recognition, line recognition and derivation of connections to the data sets.
The approach can be used in further processes like control generation, (distributed) model predictive control or fault detection.
arXiv Detail & Related papers (2021-08-31T15:09:39Z) - A Probabilistic Graphical Model Foundation for Enabling Predictive
Digital Twins at Scale [0.0]
We create an abstraction of the asset-twin system as a set of coupled dynamical systems.
We demonstrate how the model is instantiated to enable a structural digital twin of an unmanned aerial vehicle.
arXiv Detail & Related papers (2020-12-10T17:33:59Z) - Man, machine and work in a digital twin setup: a case study [77.34726150561087]
A digital twin as a virtual counterpart of a physical human-robot assembly system is built as a front-runner for validation and control through design, build, and operation.
The forms of digital twins along the system life cycle, the building blocks, and potential advantages are presented.
arXiv Detail & Related papers (2020-06-15T20:54:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.