Power Modeling for Effective Datacenter Planning and Compute Management
- URL: http://arxiv.org/abs/2103.13308v1
- Date: Mon, 22 Mar 2021 21:22:51 GMT
- Title: Power Modeling for Effective Datacenter Planning and Compute Management
- Authors: Ana Radovanovic, Bokan Chen, Saurav Talukdar, Binz Roy, Alexandre
Duarte, and Mahya Shahbazi
- Abstract summary: We discuss two classes of statistical power models designed and validated to be accurate, simple, interpretable and applicable to all hardware configurations and workloads.
We demonstrate that the proposed statistical modeling techniques, while simple and scalable, predict power with less than 5% Mean Absolute Percent Error (MAPE) for more than 95% diverse Power Distribution Units (more than 2000) using only 4 features.
- Score: 53.41102502425513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Datacenter power demand has been continuously growing and is the key driver
of its cost. An accurate mapping of compute resources (CPU, RAM, etc.) and
hardware types (servers, accelerators, etc.) to power consumption has emerged
as a critical requirement for major Web and cloud service providers. With the
global growth in datacenter capacity and associated power consumption, such
models are essential for important decisions around datacenter design and
operation. In this paper, we discuss two classes of statistical power models
designed and validated to be accurate, simple, interpretable and applicable to
all hardware configurations and workloads across hyperscale datacenters of
Google fleet. To the best of our knowledge, this is the largest scale power
modeling study of this kind, in both the scope of diverse datacenter planning
and real-time management use cases, as well as the variety of hardware
configurations and workload types used for modeling and validation. We
demonstrate that the proposed statistical modeling techniques, while simple and
scalable, predict power with less than 5% Mean Absolute Percent Error (MAPE)
for more than 95% diverse Power Distribution Units (more than 2000) using only
4 features. This performance matches the reported accuracy of the previous
started-of-the-art methods, while using significantly less features and
covering a wider range of use cases.
Related papers
- Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models.
LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z) - A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions.
In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution.
Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z) - PyDCM: Custom Data Center Models with Reinforcement Learning for Sustainability [2.6429542504022314]
PyDCM is a customizable Data Center Model implemented in Python.
The use of vectorized thermal calculations makes PyDCM orders of magnitude faster (30 times) than current Energy Plus modeling implementations.
arXiv Detail & Related papers (2023-10-05T21:24:54Z) - WattScope: Non-intrusive Application-level Power Disaggregation in
Datacenters [0.6086160084025234]
WattScope is a system for non-intrusive estimating the power consumption of individual applications.
WattScope adapts and extends a machine learning-based technique for disaggregating building power.
arXiv Detail & Related papers (2023-09-22T04:13:46Z) - Scaling Laws for Sparsely-Connected Foundation Models [70.41266138010657]
We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets.
We identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data.
arXiv Detail & Related papers (2023-09-15T16:29:27Z) - Filling time-series gaps using image techniques: Multidimensional
context autoencoder approach for building energy data imputation [0.0]
Building energy prediction and management has become increasingly important in recent decades.
Energy data is often collected from multiple sources and can be incomplete or inconsistent.
This study compares PConv, Convolutional neural networks (CNNs), and weekly persistence method using one of the biggest publicly available whole building energy datasets.
arXiv Detail & Related papers (2023-07-12T05:46:37Z) - Bringing AI to the edge: A formal M&S specification to deploy effective
IoT architectures [0.0]
The Internet of Things is transforming our society, providing new services that improve the quality of life and resource management.
These applications are based on ubiquitous networks of multiple distributed devices, with limited computing resources and power.
New architectures such as fog computing are emerging to bring computing infrastructure closer to data sources.
arXiv Detail & Related papers (2023-05-11T21:29:58Z) - On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z) - How Much More Data Do I Need? Estimating Requirements for Downstream
Tasks [99.44608160188905]
Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance?
Overestimating or underestimating data requirements incurs substantial costs that could be avoided with an adequate budget.
Using our guidelines, practitioners can accurately estimate data requirements of machine learning systems to gain savings in both development time and data acquisition costs.
arXiv Detail & Related papers (2022-07-04T21:16:05Z) - Artificial Intelligence (AI)-Centric Management of Resources in Modern
Distributed Computing Systems [22.550075095184514]
Cloud Data Centres (DCS) are large scale, complex, heterogeneous, and distributed across multiple networks and geographical boundaries.
The Internet of Things (IoT)-driven applications are producing a huge amount of data that requires real-time processing and fast response.
Existing Resource Management Systems (RMS) rely on either static or solutions inadequate for such composite and dynamic systems.
arXiv Detail & Related papers (2020-06-09T06:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.