Attention Mechanism with Energy-Friendly Operations
- URL: http://arxiv.org/abs/2204.13353v1
- Date: Thu, 28 Apr 2022 08:50:09 GMT
- Title: Attention Mechanism with Energy-Friendly Operations
- Authors: Yu Wan, Baosong Yang, Dayiheng Liu, Rong Xiao, Derek F. Wong, Haibo
Zhang, Boxing Chen, Lidia S. Chao
- Abstract summary: We rethink attention mechanism from the energy consumption aspects.
We build a novel attention model by replacing multiplications with either selective operations or additions.
Empirical results on three machine translation tasks demonstrate that the proposed model achieves competitable accuracy.
- Score: 61.58748425876866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention mechanism has become the dominant module in natural language
processing models. It is computationally intensive and depends on massive
power-hungry multiplications. In this paper, we rethink variants of attention
mechanism from the energy consumption aspects. After reaching the conclusion
that the energy costs of several energy-friendly operations are far less than
their multiplication counterparts, we build a novel attention model by
replacing multiplications with either selective operations or additions.
Empirical results on three machine translation tasks demonstrate that the
proposed model, against the vanilla one, achieves competitable accuracy while
saving 99\% and 66\% energy during alignment calculation and the whole
attention procedure. Code is available at: https://github.com/NLP2CT/E-Att.
Related papers
- FAST: Factorizable Attention for Speeding up Transformers [1.3637227185793512]
We present a linearly scaled attention mechanism that maintains the full representation of the attention matrix without compromising on sparsification.
Results indicate that our attention mechanism has a robust performance and holds significant promise for diverse applications where self-attention is used.
arXiv Detail & Related papers (2024-02-12T18:59:39Z) - The Inhibitor: ReLU and Addition-Based Attention for Efficient
Transformers [0.0]
We replace the dot-product and Softmax-based attention with an alternative mechanism involving addition and ReLU activation only.
This side-steps the expansion to double precision often required by matrix multiplication and avoids costly Softmax evaluations.
It can enable more efficient execution and support larger quantized Transformer models on resource-constrained hardware or alternative arithmetic systems like homomorphic encryption.
arXiv Detail & Related papers (2023-10-03T13:34:21Z) - On Feature Diversity in Energy-based Models [98.78384185493624]
An energy-based model (EBM) is typically formed of inner-model(s) that learn a combination of the different features to generate an energy mapping for each input configuration.
We extend the probably approximately correct (PAC) theory of EBMs and analyze the effect of redundancy reduction on the performance of EBMs.
arXiv Detail & Related papers (2023-06-02T12:30:42Z) - Energy Transformer [64.22957136952725]
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory.
We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function.
arXiv Detail & Related papers (2023-02-14T18:51:22Z) - How Much Does Attention Actually Attend? Questioning the Importance of
Attention in Pretrained Transformers [59.57128476584361]
We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones.
We find that without any input-dependent attention, all models achieve competitive performance.
We show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success.
arXiv Detail & Related papers (2022-11-07T12:37:54Z) - A Transistor Operations Model for Deep Learning Energy Consumption
Scaling [14.856688747814912]
Deep Learning (DL) has transformed the automation of a wide range of industries and finds increasing ubiquity in society.
The increasing complexity of DL models and its widespread adoption has led to the energy consumption doubling every 3-4 months.
Current FLOPs and MACs based methods only consider the linear operations.
We develop a bottom-level Transistor Operations (TOs) method to expose the role of activation functions and neural network structure in energy consumption scaling with DL model configuration.
arXiv Detail & Related papers (2022-05-30T12:42:33Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Attention that does not Explain Away [54.42960937271612]
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.
A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances.
We propose a doubly-normalized attention scheme that is simple to implement and provides theoretical guarantees for avoiding the "explaining away" effect.
arXiv Detail & Related papers (2020-09-29T21:05:39Z) - Is Attention All What You Need? -- An Empirical Investigation on
Convolution-Based Active Memory and Self-Attention [7.967230034960396]
We evaluate whether various active-memory mechanisms could replace self-attention in a Transformer.
Experiments suggest that active-memory alone achieves comparable results to the self-attention mechanism for language modelling.
For some specific algorithmic tasks, active-memory mechanisms alone outperform both self-attention and a combination of the two.
arXiv Detail & Related papers (2019-12-27T02:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.