Related papers: Energy Considerations for Large Pretrained Neural Networks

Energy Considerations for Large Pretrained Neural Networks

URL: http://arxiv.org/abs/2506.01311v1
Date: Mon, 02 Jun 2025 04:39:24 GMT
Title: Energy Considerations for Large Pretrained Neural Networks
Authors: Leo Mei, Mark Stamp,
Abstract summary: Complex neural network architectures require massive computational resources that consume substantial amounts of electricity.<n>Previous work has primarily focused on compressing models while retaining comparable model performance.<n>By quantifying the energy usage associated with both compressed and uncompressed models, we investigate compression as a means of reducing electricity consumption.<n>We find that pruning and low-rank factorization offer no significant improvements with respect to energy usage or other related statistics, while steganographic capacity reduction provides major benefits in almost every case.
Score: 1.3812010983144798
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Increasingly complex neural network architectures have achieved phenomenal performance. However, these complex models require massive computational resources that consume substantial amounts of electricity, which highlights the potential environmental impact of such models. Previous studies have demonstrated that substantial redundancies exist in large pre-trained models. However, previous work has primarily focused on compressing models while retaining comparable model performance, and the direct impact on electricity consumption appears to have received relatively little attention. By quantifying the energy usage associated with both uncompressed and compressed models, we investigate compression as a means of reducing electricity consumption. We consider nine different pre-trained models, ranging in size from 8M parameters to 138M parameters. To establish a baseline, we first train each model without compression and record the electricity usage and time required during training, along with other relevant statistics. We then apply three compression techniques: Steganographic capacity reduction, pruning, and low-rank factorization. In each of the resulting cases, we again measure the electricity usage, training time, model accuracy, and so on. We find that pruning and low-rank factorization offer no significant improvements with respect to energy usage or other related statistics, while steganographic capacity reduction provides major benefits in almost every case. We discuss the significance of these findings.

Related papers

Towards Physical Plausibility in Neuroevolution Systems [0.276240219662896]
The increasing usage of Artificial Intelligence (AI) models, especially Deep Neural Networks (DNNs), is increasing the power consumption during training and inference. This work addresses the growing energy consumption problem in Machine Learning (ML) Even a slight reduction in power usage can lead to significant energy savings, benefiting users, companies, and the environment.
arXiv Detail & Related papers (2024-01-31T10:54:34Z)
Reusing Pretrained Models by Multi-linear Operators for Efficient Training [65.64075958382034]
Training large models from scratch usually costs a substantial amount of resources. Recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model. We propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model.
arXiv Detail & Related papers (2023-10-16T06:16:47Z)
Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision. Model compression detrimentally impacts the performance of visual prompting-based transfer. However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z)
Batching for Green AI -- An Exploratory Study on Inference [8.025202812165412]
We examine the effect of input on the energy consumption and response times of five fully-trained neural networks. We find that in general energy consumption rises at a much steeper pace than accuracy and question the necessity of this evolution.
arXiv Detail & Related papers (2023-07-21T08:55:23Z)
How to use model architecture and training environment to estimate the energy consumption of DL training [5.190998244098203]
This study aims to leverage the relationship between energy consumption and two relevant design decisions in Deep Learning training. We study the training's power consumption behavior and propose four new energy estimation methods. Our results show that selecting the proper model architecture and training environment can reduce energy consumption dramatically.
arXiv Detail & Related papers (2023-07-07T12:07:59Z)
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z)
MoEfication: Conditional Computation of Transformer Models for Efficient Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon. We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Power Modeling for Effective Datacenter Planning and Compute Management [53.41102502425513]
We discuss two classes of statistical power models designed and validated to be accurate, simple, interpretable and applicable to all hardware configurations and workloads. We demonstrate that the proposed statistical modeling techniques, while simple and scalable, predict power with less than 5% Mean Absolute Percent Error (MAPE) for more than 95% diverse Power Distribution Units (more than 2000) using only 4 features.
arXiv Detail & Related papers (2021-03-22T21:22:51Z)
Towards Practical Lipreading with Distilled and Efficient Models [57.41253104365274]
Lipreading has witnessed a lot of progress due to the resurgence of neural networks. Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization. There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios. We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
arXiv Detail & Related papers (2020-07-13T16:56:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.