Exploring Self-attention for Image Recognition
- URL: http://arxiv.org/abs/2004.13621v1
- Date: Tue, 28 Apr 2020 16:01:48 GMT
- Title: Exploring Self-attention for Image Recognition
- Authors: Hengshuang Zhao, Jiaya Jia, Vladlen Koltun
- Abstract summary: We consider two forms of self-attention for image recognition.
One is pairwise self-attention, which generalizes standard dot-product attention.
The other is patchwise self-attention, which is strictly more powerful than convolution.
- Score: 151.12000247183636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown that self-attention can serve as a basic building block
for image recognition models. We explore variations of self-attention and
assess their effectiveness for image recognition. We consider two forms of
self-attention. One is pairwise self-attention, which generalizes standard
dot-product attention and is fundamentally a set operator. The other is
patchwise self-attention, which is strictly more powerful than convolution. Our
pairwise self-attention networks match or outperform their convolutional
counterparts, and the patchwise models substantially outperform the
convolutional baselines. We also conduct experiments that probe the robustness
of learned representations and conclude that self-attention networks may have
significant benefits in terms of robustness and generalization.
Related papers
- Dual Cross-Attention Learning for Fine-Grained Visual Categorization and
Object Re-Identification [19.957957963417414]
We propose a dual cross-attention learning (DCAL) algorithm to coordinate with self-attention learning.
First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions.
Second, we propose pair-wise cross-attention (PWCA) to establish the interactions between image pairs.
arXiv Detail & Related papers (2022-05-04T16:14:26Z) - Understanding The Robustness in Vision Transformers [140.1090560977082]
Self-attention may promote robustness through improved mid-level representations.
We propose a family of fully attentional networks (FANs) that strengthen this capability.
Our model achieves a state of-the-art 87.1% accuracy and 35.8% mCE on ImageNet-1k and ImageNet-C with 76.8M parameters.
arXiv Detail & Related papers (2022-04-26T17:16:32Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - Counterfactual Attention Learning for Fine-Grained Visual Categorization
and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference.
Specifically, we analyze the effect of the learned visual attention on network prediction.
We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.