Energy Consumption in Parallel Neural Network Training
- URL: http://arxiv.org/abs/2508.07706v1
- Date: Mon, 11 Aug 2025 07:34:04 GMT
- Title: Energy Consumption in Parallel Neural Network Training
- Authors: Philipp Huber, David Li, Juan Pedro Gutiérrez Hermosillo Muriedas, Deifilia Kieckhefen, Markus Götz, Achim Streit, Charlotte Debus,
- Abstract summary: parallelization of data-parallel training of two models, ResNet50 and FourCastNet, was investigated.<n>We show that energy consumption scales approximately linearly with the consumed resources, i.e., GPU hours.<n>Our results shed light on the complex interplay of scaling up neural network training and can inform future developments.
- Score: 1.609432596422649
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing demand for computational resources of training neural networks leads to a concerning growth in energy consumption. While parallelization has enabled upscaling model and dataset sizes and accelerated training, its impact on energy consumption is often overlooked. To close this research gap, we conducted scaling experiments for data-parallel training of two models, ResNet50 and FourCastNet, and evaluated the impact of parallelization parameters, i.e., GPU count, global batch size, and local batch size, on predictive performance, training time, and energy consumption. We show that energy consumption scales approximately linearly with the consumed resources, i.e., GPU hours; however, the respective scaling factor differs substantially between distinct model trainings and hardware, and is systematically influenced by the number of samples and gradient updates per GPU hour. Our results shed light on the complex interplay of scaling up neural network training and can inform future developments towards more sustainable AI research.
Related papers
- Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression [53.39128997308138]
We introduce information capacity, a measure of model efficiency based on text compression performance.<n> Empirical evaluations on mainstream open-source models show that models of varying sizes within a series exhibit consistent information capacity.<n>A distinctive feature of information capacity is that it incorporates tokenizer efficiency, which affects both input and output token counts.
arXiv Detail & Related papers (2025-11-11T10:07:32Z) - Deep Lookup Network [76.66809324649154]
In many resource-limited edge devices, complicated operations can be calculated via lookup tables to reduce computational cost.<n>We introduce a generic and efficient lookup operation which can be used as a basic operation for the construction of neural networks.<n>By replacing computationally expensive multiplication operations with our lookup operations, we develop lookup networks for the image classification, image super-resolution, and point cloud classification tasks.
arXiv Detail & Related papers (2025-09-17T03:31:41Z) - Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies [66.83950068218033]
Scaling Laws demonstrate that scaling model parameters and training data enhances learning performance.<n>Despite its potential to improve performance, the integration of scaling laws into deep reinforcement learning has not been fully realized.<n>This review addresses this gap by systematically analyzing scaling strategies in three dimensions: data, network, and training budget.
arXiv Detail & Related papers (2025-08-05T08:03:12Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Batching for Green AI -- An Exploratory Study on Inference [8.025202812165412]
We examine the effect of input on the energy consumption and response times of five fully-trained neural networks.
We find that in general energy consumption rises at a much steeper pace than accuracy and question the necessity of this evolution.
arXiv Detail & Related papers (2023-07-21T08:55:23Z) - Minimizing Energy Consumption of Deep Learning Models by Energy-Aware
Training [26.438415753870917]
We propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training.
We demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.
arXiv Detail & Related papers (2023-07-01T15:44:01Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Energy Consumption of Neural Networks on NVIDIA Edge Boards: an
Empirical Model [6.809944967863927]
Recently, there has been a trend of shifting the execution of deep learning inference tasks toward the edge of the network, closer to the user, to reduce latency and preserve data privacy.
In this work, we aim at profiling the energetic consumption of inference tasks for some modern edge nodes.
We have then distilled a simple, practical model that can provide an estimate of the energy consumption of a certain inference task on the considered boards.
arXiv Detail & Related papers (2022-10-04T14:12:59Z) - Benchmarking Resource Usage for Efficient Distributed Deep Learning [10.869092085691687]
We conduct over 3,400 experiments training an array of deep networks representing various domains/tasks.
We fit power law models that describe how training time scales with available compute resources and energy constraints.
arXiv Detail & Related papers (2022-01-28T21:24:15Z) - Compute, Time and Energy Characterization of Encoder-Decoder Networks
with Automatic Mixed Precision Training [6.761235154230549]
We show that it is possible to achieve a significant improvement in training time by leveraging mixed-precision training without sacrificing model performance.
We find that a 1549% increase in the number of trainable parameters for a network comes at a relatively smaller 63.22% increase in energy usage for a UNet with 4 encoding layers.
arXiv Detail & Related papers (2020-08-18T17:44:24Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.