Related papers: Attention Mechanisms in Computer Vision: A Survey

Attention Mechanisms in Computer Vision: A Survey

URL: http://arxiv.org/abs/2111.07624v1
Date: Mon, 15 Nov 2021 09:18:40 GMT
Title: Attention Mechanisms in Computer Vision: A Survey
Authors: Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu
Abstract summary: We provide a comprehensive review of various attention mechanisms in computer vision. We categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention. We suggest future directions for attention mechanism research.
Score: 75.6074182122423
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multi-modal tasks and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

Related papers

Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights [5.798431829723857]
This paper provides a comprehensive exploration of techniques and insights for designing attention mechanisms in Vision Transformer (ViT) networks. We present a systematic taxonomy of various attention mechanisms within ViTs, employing redesigned approaches. The analysis includes an exploration of the novelty, strengths, weaknesses, and an in-depth evaluation of the different proposed strategies.
arXiv Detail & Related papers (2024-03-28T23:31:59Z)
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work [48.69845068325126]
Local mechanisms are designed to boost the development of computer vision. They can not only focus on target parts to learn discriminative local representations, but also process information selectively to improve the efficiency. In this survey, we provide a systematic review of local mechanisms for various computer vision tasks and approaches.
arXiv Detail & Related papers (2023-06-02T22:05:52Z)
Self-attention in Vision Transformers Performs Perceptual Grouping, Not Attention [11.789983276366986]
We show that attention mechanisms in vision transformers exhibit similar effects as those known in human visual attention. Our results suggest that self-attention modules group figures in the stimuli based on similarity in visual features such as color. In a singleton detection experiment, we studied if these models exhibit similar effects as those of feed-forward visual salience mechanisms utilized in human visual attention.
arXiv Detail & Related papers (2023-03-02T19:18:11Z)
BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity. Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z)
Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition. We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data. We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z)
GAMR: A Guided Attention Model for (visual) Reasoning [7.919213739992465]
Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. We present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR) GAMR posits that the brain solves complex visual reasoning problems dynamically via sequences of attention shifts to select and route task-relevant visual information into memory.
arXiv Detail & Related papers (2022-06-10T07:52:06Z)
Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head. It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention. On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z)
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference. Specifically, we analyze the effect of the learned visual attention on network prediction. We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z)
Understanding top-down attention using task-oriented ablation design [0.22940141855172028]
Top-down attention allows neural networks, both artificial and biological, to focus on the information most relevant for a given task. We aim to answer this with a computational experiment based on a general framework called task-oriented ablation design. We compare the performance of two neural networks, one with top-down attention and one without.
arXiv Detail & Related papers (2021-06-08T21:01:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.