Related papers: Understanding the Energy Consumption of HPC Scale Artificial Intelligence

Understanding the Energy Consumption of HPC Scale Artificial Intelligence

URL: http://arxiv.org/abs/2212.00582v1
Date: Mon, 14 Nov 2022 08:51:17 GMT
Title: Understanding the Energy Consumption of HPC Scale Artificial Intelligence
Authors: Danilo Carastan dos Santos (DATAMOVE, UGA)
Abstract summary: This paper contributes towards better understanding the energy consumption trade-offs of HPC scale Artificial Intelligence (AI) and more specifically Deep Learning (DL) algorithms. We developed benchmark-tracker, a benchmark tool to evaluate the speed and energy consumption of DL algorithms in HPC environments.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper contributes towards better understanding the energy consumption trade-offs of HPC scale Artificial Intelligence (AI), and more specifically Deep Learning (DL) algorithms. For this task we developed benchmark-tracker, a benchmark tool to evaluate the speed and energy consumption of DL algorithms in HPC environments. We exploited hardware counters and Python libraries to collect energy information through software, which enabled us to instrument a known AI benchmark tool, and to evaluate the energy consumption of numerous DL algorithms and models. Through an experimental campaign, we show a case example of the potential of benchmark-tracker to measure the computing speed and the energy consumption for training and inference DL algorithms, and also the potential of Benchmark-Tracker to help better understanding the energy behavior of DL algorithms in HPC platforms. This work is a step forward to better understand the energy consumption of Deep Learning in HPC, and it also contributes with a new tool to help HPC DL developers to better balance the HPC infrastructure in terms of speed and energy consumption.

Related papers

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
THOR: A Generic Energy Estimation Approach for On-Device Training [34.57867978862375]
THOR is a generic approach for energy consumption estimation in deep neural network (DNN) training. We conduct extensive experiments with various types of models across different real-world platforms. The results demonstrate that THOR has effectively reduced the Mean Absolute Percentage Error (MAPE) by up to 30%.
arXiv Detail & Related papers (2025-01-27T03:29:02Z)
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning [49.997801914237094]
We introduce the Fire-Flyer AI- HPC architecture, a synergistic hardware-software co-design framework and its best practices. For Deep Learning (DL) training, we deployed the Fire-Flyer 2 with 10,000 PCIe A100 GPUs, achieved performance approximating the DGX-A100 while reducing costs by half and energy consumption by 40%. Through our software stack, including HaiScale, 3FS, and HAI-Platform, we achieved substantial scalability by overlapping computation and communication.
arXiv Detail & Related papers (2024-08-26T10:11:56Z)
Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems. We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z)
Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement [11.37120215795946]
This paper introduces FECoM (Fine-grained Energy Consumption Meter), a framework for fine-grained Deep Learning energy consumption measurement. FECoM addresses the challenges of measuring energy consumption at fine-grained level by using static instrumentation and considering various factors, including computational load stability and temperature.
arXiv Detail & Related papers (2023-08-23T17:32:06Z)
Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads [0.534434568021034]
We present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes. One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer.
arXiv Detail & Related papers (2022-12-03T21:40:55Z)
Trends in Energy Estimates for Computing in AI/Machine Learning Accelerators, Supercomputers, and Compute-Intensive Applications [3.2634122554914]
We examine the computational energy requirements of different systems driven by the geometrical scaling law. We show that energy efficiency due to geometrical scaling is slowing down. At the application level, general-purpose AI-ML methods can be computationally energy intensive.
arXiv Detail & Related papers (2022-10-12T16:14:33Z)
Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive. We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading. We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference. Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z)
Optimizing the Long-Term Average Reward for Continuing MDPs: A Technical Report [117.23323653198297]
We have struck the balance between the information freshness, experienced by users and energy consumed by sensors. We cast the corresponding status update procedure as a continuing Markov Decision Process (MDP) To circumvent the curse of dimensionality, we have established a methodology for designing deep reinforcement learning (DRL) algorithms.
arXiv Detail & Related papers (2021-04-13T12:29:55Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)
Learning Centric Power Allocation for Edge Intelligence [84.16832516799289]
Edge intelligence has been proposed, which collects distributed data and performs machine learning at the edge. This paper proposes a learning centric power allocation (LCPA) method, which allocates radio resources based on an empirical classification error model. Experimental results show that the proposed LCPA algorithm significantly outperforms other power allocation algorithms.
arXiv Detail & Related papers (2020-07-21T07:02:07Z)
Catch Me If You Can: Using Power Analysis to Identify HPC Activity [0.35534933448684125]
We show how electrical power consumption data from an HPC platform can be used to identify what programs are executed. We test our approach on an HPC rack at Lawrence Berkeley National Laboratory using a variety of scientific benchmarks.
arXiv Detail & Related papers (2020-05-06T20:57:41Z)
Learnergy: Energy-based Machine Learners [0.0]
Machine learning techniques have been broadly encouraged in the context of deep learning architectures. An exciting algorithm denoted as Restricted Boltzmann Machine relies on energy- and probabilistic-based nature to tackle the most diverse applications, such as classification, reconstruction, and generation of images and signals.
arXiv Detail & Related papers (2020-03-16T21:14:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.