Explicitly Modeled Attention Maps for Image Classification
- URL: http://arxiv.org/abs/2006.07872v2
- Date: Thu, 18 Mar 2021 14:18:57 GMT
- Title: Explicitly Modeled Attention Maps for Image Classification
- Authors: Andong Tan, Duc Tam Nguyen, Maximilian Dax, Matthias Nie{\ss}ner,
Thomas Brox
- Abstract summary: Self-attention networks have shown remarkable progress in computer vision tasks such as image classification.
We propose a novel self-attention module with explicitly modeled attention-maps using only a single learnable parameter for low computational overhead.
Our method achieves an accuracy improvement of up to 2.2% over the ResNet-baselines in ImageNet ILSVRC.
- Score: 35.72763148637619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-attention networks have shown remarkable progress in computer vision
tasks such as image classification. The main benefit of the self-attention
mechanism is the ability to capture long-range feature interactions in
attention-maps. However, the computation of attention-maps requires a learnable
key, query, and positional encoding, whose usage is often not intuitive and
computationally expensive. To mitigate this problem, we propose a novel
self-attention module with explicitly modeled attention-maps using only a
single learnable parameter for low computational overhead. The design of
explicitly modeled attention-maps using geometric prior is based on the
observation that the spatial context for a given pixel within an image is
mostly dominated by its neighbors, while more distant pixels have a minor
contribution. Concretely, the attention-maps are parametrized via simple
functions (e.g., Gaussian kernel) with a learnable radius, which is modeled
independently of the input content. Our evaluation shows that our method
achieves an accuracy improvement of up to 2.2% over the ResNet-baselines in
ImageNet ILSVRC and outperforms other self-attention methods such as
AA-ResNet152 in accuracy by 0.9% with 6.4% fewer parameters and 6.7% fewer
GFLOPs. This result empirically indicates the value of incorporating geometric
prior into self-attention mechanism when applied in image classification.
Related papers
- Vision Eagle Attention: A New Lens for Advancing Image Classification [0.8158530638728501]
I introduce Vision Eagle Attention, a novel attention mechanism that enhances visual feature extraction using convolutional spatial attention.
The model applies convolution to capture local spatial features and generates an attention map that selectively emphasizes the most informative regions of the image.
I have integrated Vision Eagle Attention into a lightweight ResNet-18 architecture, demonstrating that this combination results in an efficient and powerful model.
arXiv Detail & Related papers (2024-11-15T20:21:59Z) - Interaction-aware Joint Attention Estimation Using People Attributes [6.8603181780291065]
This paper proposes joint attention estimation in a single image.
For the interaction modeling, we propose a novel Transformer-based attention network to encode joint attention as low-dimensional features.
Our method outperforms SOTA methods quantitatively in comparative experiments.
arXiv Detail & Related papers (2023-08-10T06:55:51Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep
Network for Image Recognition [13.230646408771868]
We propose an end-to-end CNN model, which learns meaningful features linking fine-grained changes using our novel attention mechanism.
It captures the spatial structures in images by identifying semantic regions (SRs) and their spatial distributions, and is proved to be the key to modelling subtle changes in images.
The framework is evaluated on six diverse benchmark datasets.
arXiv Detail & Related papers (2021-10-23T09:43:36Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Coordinate Attention for Efficient Mobile Network Design [96.40415345942186]
We propose a novel attention mechanism for mobile networks by embedding positional information into channel attention.
Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes.
Our coordinate attention is beneficial to ImageNet classification and behaves better in down-stream tasks, such as object detection and semantic segmentation.
arXiv Detail & Related papers (2021-03-04T09:18:02Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.