Key-Value Transformer
- URL: http://arxiv.org/abs/2305.19129v1
- Date: Sun, 28 May 2023 20:26:06 GMT
- Title: Key-Value Transformer
- Authors: Ali Borji
- Abstract summary: Key-value formulation (KV) generates symmetric attention maps, along with an asymmetric version that incorporates a 2D positional encoding into the attention matrix.
Experiments encompass three task types -- synthetics (such as reversing or sorting a list), vision (mnist or cifar classification), and NLP.
- Score: 47.64219291655723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have emerged as the prevailing standard solution for various AI
tasks, including computer vision and natural language processing. The widely
adopted Query, Key, and Value formulation (QKV) has played a significant role
in this. Nevertheless, no research has examined the essentiality of these three
components for transformer performance. Therefore, we conducted an evaluation
of the key-value formulation (KV), which generates symmetric attention maps,
along with an asymmetric version that incorporates a 2D positional encoding
into the attention matrix. Remarkably, this transformer requires fewer
parameters and computation than the original one. Through experiments
encompassing three task types -- synthetics (such as reversing or sorting a
list), vision (mnist or cifar classification), and NLP (character generation
and translation) -- we discovered that the KV transformer occasionally
outperforms the QKV transformer. However, it also exhibits instances of
underperformance compared to QKV, making it challenging to draw a definitive
conclusion. Nonetheless, we consider the reported results to be encouraging and
anticipate that they may pave the way for more efficient transformers in the
future.
Related papers
- DAPE V2: Process Attention Score as Feature Map for Length Extrapolation [63.87956583202729]
We conceptualize attention as a feature map and apply the convolution operator to mimic the processing methods in computer vision.
The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution.
arXiv Detail & Related papers (2024-10-07T07:21:49Z) - Quaternion Orthogonal Transformer for Facial Expression Recognition in
the Wild [3.2898396463438995]
We develop a quaternion vision transformer (Q-ViT) for feature classification.
Experimental results on three in-the-wild FER datasets show that the proposed QOT outperforms several state-of-the-art models.
arXiv Detail & Related papers (2023-03-14T12:07:48Z) - Towards Lightweight Transformer via Group-wise Transformation for
Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer.
We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.
Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z) - DoT: An efficient Double Transformer for NLP tasks with tables [3.0079490585515343]
DoT is a double transformer model that decomposes the problem into two sub-tasks.
We show that for a small drop of accuracy, DoT improves training and inference time by at least 50%.
arXiv Detail & Related papers (2021-06-01T13:33:53Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z) - Transformer on a Diet [81.09119185568296]
Transformer has been widely used thanks to its ability to capture sequence information in an efficient way.
Recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness.
We explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results.
arXiv Detail & Related papers (2020-02-14T18:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.