Convolution-enhanced Evolving Attention Networks
- URL: http://arxiv.org/abs/2212.08330v2
- Date: Fri, 28 Apr 2023 06:44:48 GMT
- Title: Convolution-enhanced Evolving Attention Networks
- Authors: Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang,
Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong
- Abstract summary: Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly.
This is the first work that explicitly models the layer-wise evolution of attention maps.
- Score: 41.684265133316096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention-based neural networks, such as Transformers, have become ubiquitous
in numerous applications, including computer vision, natural language
processing, and time-series analysis. In all kinds of attention networks, the
attention maps are crucial as they encode semantic dependencies between input
tokens. However, most existing attention networks perform modeling or reasoning
based on representations , wherein the attention maps of different layers are
learned separately without explicit interactions. In this paper, we propose a
novel and generic evolving attention mechanism, which directly models the
evolution of inter-token relationships through a chain of residual
convolutional modules. The major motivations are twofold. On the one hand, the
attention maps in different layers share transferable knowledge, thus adding a
residual connection can facilitate the information flow of inter-token
relationships across layers. On the other hand, there is naturally an
evolutionary trend among attention maps at different abstraction levels, so it
is beneficial to exploit a dedicated convolution-based module to capture this
process. Equipped with the proposed mechanism, the convolution-enhanced
evolving attention networks achieve superior performance in various
applications, including time-series representation, natural language
understanding, machine translation, and image classification. Especially on
time-series representation tasks, Evolving Attention-enhanced Dilated
Convolutional (EA-DC-) Transformer outperforms state-of-the-art models
significantly, achieving an average of 17% improvement compared to the best
SOTA. To the best of our knowledge, this is the first work that explicitly
models the layer-wise evolution of attention maps. Our implementation is
available at https://github.com/pkuyym/EvolvingAttention.
Related papers
- DAPE V2: Process Attention Score as Feature Map for Length Extrapolation [63.87956583202729]
We conceptualize attention as a feature map and apply the convolution operator to mimic the processing methods in computer vision.
The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution.
arXiv Detail & Related papers (2024-10-07T07:21:49Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Assessing the Impact of Attention and Self-Attention Mechanisms on the
Classification of Skin Lesions [0.0]
We focus on two forms of attention mechanisms: attention modules and self-attention.
Attention modules are used to reweight the features of each layer input tensor.
Self-Attention, originally proposed in the area of Natural Language Processing makes it possible to relate all the items in an input sequence.
arXiv Detail & Related papers (2021-12-23T18:02:48Z) - Relational Self-Attention: What's Missing in Attention for Video
Understanding [52.38780998425556]
We introduce a relational feature transform, dubbed the relational self-attention (RSA)
Our experiments and ablation studies show that the RSA network substantially outperforms convolution and self-attention counterparts.
arXiv Detail & Related papers (2021-11-02T15:36:11Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - GAttANet: Global attention agreement for convolutional neural networks [0.0]
Transformer attention architectures, similar to those developed for natural language processing, have recently proved efficient also in vision.
Here, we report experiments with a simple such attention system that can improve the performance of standard convolutional networks.
We demonstrate the usefulness of this brain-inspired Global Attention Agreement network for various convolutional backbones.
arXiv Detail & Related papers (2021-04-12T15:45:10Z) - Evolving Attention with Residual Convolutions [29.305149185821882]
We propose a novel mechanism based on evolving attention to improve the performance of transformers.
The proposed attention mechanism achieves significant performance improvement over various state-of-the-art models for multiple tasks.
arXiv Detail & Related papers (2021-02-20T15:24:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.