Precise Energy Consumption Measurements of Heterogeneous Artificial
Intelligence Workloads
- URL: http://arxiv.org/abs/2212.01698v1
- Date: Sat, 3 Dec 2022 21:40:55 GMT
- Title: Precise Energy Consumption Measurements of Heterogeneous Artificial
Intelligence Workloads
- Authors: Ren\'e Caspart, Sebastian Ziegler, Arvid Weyrauch, Holger Obermaier,
Simon Raffeiner, Leon Pascal Schuhmacher, Jan Scholtyssek, Darya Trofimova,
Marco Nolden, Ines Reinartz, Fabian Isensee, Markus G\"otz, Charlotte Debus
- Abstract summary: We present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes.
One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer.
- Score: 0.534434568021034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rise of AI in recent years and the increase in complexity of the
models, the growing demand in computational resources is starting to pose a
significant challenge. The need for higher compute power is being met with
increasingly more potent accelerators and the use of large compute clusters.
However, the gain in prediction accuracy from large models trained on
distributed and accelerated systems comes at the price of a substantial
increase in energy demand, and researchers have started questioning the
environmental friendliness of such AI methods at scale. Consequently, energy
efficiency plays an important role for AI model developers and infrastructure
operators alike. The energy consumption of AI workloads depends on the model
implementation and the utilized hardware. Therefore, accurate measurements of
the power draw of AI workflows on different types of compute nodes is key to
algorithmic improvements and the design of future compute clusters and
hardware. To this end, we present measurements of the energy consumption of two
typical applications of deep learning models on different types of compute
nodes. Our results indicate that 1. deriving energy consumption directly from
runtime is not accurate, but the consumption of the compute node needs to be
considered regarding its composition; 2. neglecting accelerator hardware on
mixed nodes results in overproportional inefficiency regarding energy
consumption; 3. energy consumption of model training and inference should be
considered separately - while training on GPUs outperforms all other node types
regarding both runtime and energy consumption, inference on CPU nodes can be
comparably efficient. One advantage of our approach is that the information on
energy consumption is available to all users of the supercomputer, enabling an
easy transfer to other workloads alongside a raise in user-awareness of energy
consumption.
Related papers
- Towards Physical Plausibility in Neuroevolution Systems [0.276240219662896]
The increasing usage of Artificial Intelligence (AI) models, especially Deep Neural Networks (DNNs), is increasing the power consumption during training and inference.
This work addresses the growing energy consumption problem in Machine Learning (ML)
Even a slight reduction in power usage can lead to significant energy savings, benefiting users, companies, and the environment.
arXiv Detail & Related papers (2024-01-31T10:54:34Z) - Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks.
Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios.
New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z) - Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A
Multi-Agent Reinforcement Learning Approach [48.18355658448509]
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption.
Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy.
We propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities.
arXiv Detail & Related papers (2023-04-17T02:12:30Z) - EPAM: A Predictive Energy Model for Mobile AI [6.451060076703027]
We introduce a comprehensive study of mobile AI applications considering different deep neural network (DNN) models and processing sources.
We measure the latency, energy consumption, and memory usage of all the models using four processing sources.
Our study highlights important insights, such as how mobile AI behaves in different applications (vision and non-vision) using CPU, GPU, and NNAPI.
arXiv Detail & Related papers (2023-03-02T09:11:23Z) - Trends in Energy Estimates for Computing in AI/Machine Learning
Accelerators, Supercomputers, and Compute-Intensive Applications [3.2634122554914]
We examine the computational energy requirements of different systems driven by the geometrical scaling law.
We show that energy efficiency due to geometrical scaling is slowing down.
At the application level, general-purpose AI-ML methods can be computationally energy intensive.
arXiv Detail & Related papers (2022-10-12T16:14:33Z) - Great Power, Great Responsibility: Recommendations for Reducing Energy
for Training Language Models [8.927248087602942]
We investigate techniques that can be used to reduce the energy consumption of common NLP applications.
These techniques can lead to significant reduction in energy consumption when training language models or their use for inference.
arXiv Detail & Related papers (2022-05-19T16:03:55Z) - Compute and Energy Consumption Trends in Deep Learning Inference [67.32875669386488]
We study relevant models in the areas of computer vision and natural language processing.
For a sustained increase in performance we see a much softer growth in energy consumption than previously anticipated.
arXiv Detail & Related papers (2021-09-12T09:40:18Z) - Power Modeling for Effective Datacenter Planning and Compute Management [53.41102502425513]
We discuss two classes of statistical power models designed and validated to be accurate, simple, interpretable and applicable to all hardware configurations and workloads.
We demonstrate that the proposed statistical modeling techniques, while simple and scalable, predict power with less than 5% Mean Absolute Percent Error (MAPE) for more than 95% diverse Power Distribution Units (more than 2000) using only 4 features.
arXiv Detail & Related papers (2021-03-22T21:22:51Z) - Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A
Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network.
We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z) - Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable
Edge Computing Systems [87.4519172058185]
An effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied.
A novel multi-agent meta-reinforcement learning (MAMRL) framework is proposed to solve the formulated problem.
Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost.
arXiv Detail & Related papers (2020-02-20T04:58:07Z) - Improving Efficiency in Neural Network Accelerator Using Operands
Hamming Distance optimization [11.309076080980828]
We show that the data-path energy is highly correlated with the bit flips when streaming the input operands into the arithmetic units.
We propose a post-training optimization algorithm and a hamming-distance-aware training algorithm to co-optimize the accelerator and the network.
arXiv Detail & Related papers (2020-02-13T00:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.