A Transistor Operations Model for Deep Learning Energy Consumption
Scaling
- URL: http://arxiv.org/abs/2205.15062v1
- Date: Mon, 30 May 2022 12:42:33 GMT
- Title: A Transistor Operations Model for Deep Learning Energy Consumption
Scaling
- Authors: Chen Li, Antonios Tsourdos, Weisi Guo
- Abstract summary: Deep Learning (DL) has transformed the automation of a wide range of industries and finds increasing ubiquity in society.
The increasing complexity of DL models and its widespread adoption has led to the energy consumption doubling every 3-4 months.
Current FLOPs and MACs based methods only consider the linear operations.
We develop a bottom-level Transistor Operations (TOs) method to expose the role of activation functions and neural network structure in energy consumption scaling with DL model configuration.
- Score: 14.856688747814912
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Learning (DL) has transformed the automation of a wide range of
industries and finds increasing ubiquity in society. The increasing complexity
of DL models and its widespread adoption has led to the energy consumption
doubling every 3-4 months. Currently, the relationship between DL model
configuration and energy consumption is not well established. Current FLOPs and
MACs based methods only consider the linear operations. In this paper, we
develop a bottom-level Transistor Operations (TOs) method to expose the role of
activation functions and neural network structure in energy consumption scaling
with DL model configuration. TOs allows us uncovers the role played by
non-linear operations (e.g. division/root operations performed by activation
functions and batch normalisation). As such, our proposed TOs model provides
developers with a hardware-agnostic index for how energy consumption scales
with model settings. To validate our work, we analyse the TOs energy scaling of
a feed-forward DNN model set and achieve a 98.2% - 99.97% precision in
estimating its energy consumption. We believe this work can be extended to any
DL model.
Related papers
- Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning [65.31677646659895]
This paper focuses on the concept of task-specific directions (TSDs)-critical for transitioning large models from pretrained states to task-specific enhancements in PEFT.
We introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process, thereby enhancing model performance on targeted tasks.
arXiv Detail & Related papers (2024-09-02T08:10:51Z) - Automated Deep Learning for Load Forecasting [0.34952465649465553]
This paper explains why and how we used Automated Deep Learning (AutoDL) to find performing Deep Neural Networks (DNNs) for load forecasting.
We end up creating an AutoDL framework called EnergyDragon by extending the DRAGON package and applying it to load forecasting.
We demonstrate on the French load signal that EnergyDragon can find original DNNs that outperform state-of-the-art load forecasting methods.
arXiv Detail & Related papers (2024-05-14T07:51:55Z) - Energy Efficient Deep Multi-Label ON/OFF Classification of Low Frequency Metered Home Appliances [0.16777183511743468]
Non-intrusive load monitoring (NILM) is the process of obtaining appliance-level data from a single metering point.
We introduce a novel DL model aimed at enhanced multi-label classification of NILM with improved computation and energy efficiency.
Compared to the state-of-the-art, the proposed model has its energy consumption reduced by more than 23%.
arXiv Detail & Related papers (2023-07-18T13:23:23Z) - Minimizing Energy Consumption of Deep Learning Models by Energy-Aware
Training [26.438415753870917]
We propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training.
We demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.
arXiv Detail & Related papers (2023-07-01T15:44:01Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Efficient Transformers in Reinforcement Learning using Actor-Learner
Distillation [91.05073136215886]
"Actor-Learner Distillation" transfers learning progress from a large capacity learner model to a small capacity actor model.
We demonstrate in several challenging memory environments that using Actor-Learner Distillation recovers the clear sample-efficiency gains of the transformer learner model.
arXiv Detail & Related papers (2021-04-04T17:56:34Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z) - Energy-Based Processes for Exchangeable Data [109.04978766553612]
We introduce Energy-Based Processes (EBPs) to extend energy based models to exchangeable data.
A key advantage of EBPs is the ability to express more flexible distributions over sets without restricting their cardinality.
We develop an efficient training procedure for EBPs that demonstrates state-of-the-art performance on a variety of tasks.
arXiv Detail & Related papers (2020-03-17T04:26:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.