Couplformer:Rethinking Vision Transformer with Coupling Attention Map
- URL: http://arxiv.org/abs/2112.05425v1
- Date: Fri, 10 Dec 2021 10:05:35 GMT
- Title: Couplformer:Rethinking Vision Transformer with Coupling Attention Map
- Authors: Hai Lan, Xihao Wang, Xian Wei
- Abstract summary: The Transformer model has demonstrated its outstanding performance in the computer vision domain.
We propose a novel memory economy attention mechanism named Couplformer, which decouples the attention map into two sub-matrices.
Experiments show that the Couplformer can significantly decrease 28% memory consumption compared with regular Transformer.
- Score: 7.789667260916264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the development of the self-attention mechanism, the Transformer model
has demonstrated its outstanding performance in the computer vision domain.
However, the massive computation brought from the full attention mechanism
became a heavy burden for memory consumption. Sequentially, the limitation of
memory reduces the possibility of improving the Transformer model. To remedy
this problem, we propose a novel memory economy attention mechanism named
Couplformer, which decouples the attention map into two sub-matrices and
generates the alignment scores from spatial information. A series of different
scale image classification tasks are applied to evaluate the effectiveness of
our model. The result of experiments shows that on the ImageNet-1k
classification task, the Couplformer can significantly decrease 28% memory
consumption compared with regular Transformer while accessing sufficient
accuracy requirements and outperforming 0.92% on Top-1 accuracy while occupying
the same memory footprint. As a result, the Couplformer can serve as an
efficient backbone in visual tasks, and provide a novel perspective on the
attention mechanism for researchers.
Related papers
- Dual Path Transformer with Partition Attention [26.718318398951933]
We present a novel attention mechanism, called dual attention, which is both efficient and effective.
We evaluate the effectiveness of our model on several computer vision tasks, including image classification on ImageNet, object detection on COCO, and semantic segmentation on Cityscapes.
The proposed DualFormer-XS achieves 81.5% top-1 accuracy on ImageNet, outperforming the recent state-of-the-artiT-XS by 0.6% top-1 accuracy with much higher throughput.
arXiv Detail & Related papers (2023-05-24T06:17:53Z) - CageViT: Convolutional Activation Guided Efficient Vision Transformer [90.69578999760206]
This paper presents an efficient vision Transformer, called CageViT, that is guided by convolutional activation to reduce computation.
Our CageViT, unlike current Transformers, utilizes a new encoder to handle the rearranged tokens.
Experimental results demonstrate that the proposed CageViT outperforms the most recent state-of-the-art backbones by a large margin in terms of efficiency.
arXiv Detail & Related papers (2023-05-17T03:19:18Z) - AttMEMO : Accelerating Transformers with Memoization on Big Memory
Systems [10.585040856070941]
We introduce a novel embedding technique to find semantically similar inputs to identify computation similarity.
We enable 22% inference-latency reduction on average (up to 68%) with negligible loss in inference accuracy.
arXiv Detail & Related papers (2023-01-23T04:24:26Z) - Vicinity Vision Transformer [53.43198716947792]
We present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.
Our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous methods.
arXiv Detail & Related papers (2022-06-21T17:33:53Z) - Visualizing and Understanding Patch Interactions in Vision Transformer [96.70401478061076]
Vision Transformer (ViT) has become a leading tool in various computer vision tasks.
We propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for vision transformer.
arXiv Detail & Related papers (2022-03-11T13:48:11Z) - Learned Queries for Efficient Local Attention [11.123272845092611]
Self-attention mechanism in vision transformers suffers from high latency and inefficient memory utilization.
We propose a new shift-invariant local attention layer, called query and attend (QnA), that aggregates the input locally in an overlapping manner.
We show improvements in speed and memory complexity while achieving comparable accuracy with state-of-the-art models.
arXiv Detail & Related papers (2021-12-21T18:52:33Z) - AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use.
Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z) - Patch Slimming for Efficient Vision Transformers [107.21146699082819]
We study the efficiency problem for visual transformers by excavating redundant calculation in given networks.
We present a novel patch slimming approach that discards useless patches in a top-down paradigm.
Experimental results on benchmark datasets demonstrate that the proposed method can significantly reduce the computational costs of vision transformers.
arXiv Detail & Related papers (2021-06-05T09:46:00Z) - Attention that does not Explain Away [54.42960937271612]
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.
A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances.
We propose a doubly-normalized attention scheme that is simple to implement and provides theoretical guarantees for avoiding the "explaining away" effect.
arXiv Detail & Related papers (2020-09-29T21:05:39Z) - Is Attention All What You Need? -- An Empirical Investigation on
Convolution-Based Active Memory and Self-Attention [7.967230034960396]
We evaluate whether various active-memory mechanisms could replace self-attention in a Transformer.
Experiments suggest that active-memory alone achieves comparable results to the self-attention mechanism for language modelling.
For some specific algorithmic tasks, active-memory mechanisms alone outperform both self-attention and a combination of the two.
arXiv Detail & Related papers (2019-12-27T02:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.