Information Bottleneck Approach to Spatial Attention Learning
- URL: http://arxiv.org/abs/2108.03418v1
- Date: Sat, 7 Aug 2021 10:35:32 GMT
- Title: Information Bottleneck Approach to Spatial Attention Learning
- Authors: Qiuxia Lai and Yu Li and Ailing Zeng and Minhao Liu and Hanqiu Sun and
Qiang Xu
- Abstract summary: The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes.
This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy.
We propose an IB-inspired spatial attention module for deep neural networks (DNNs) built for visual recognition.
- Score: 21.083618550304703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The selective visual attention mechanism in the human visual system (HVS)
restricts the amount of information to reach visual awareness for perceiving
natural scenes, allowing near real-time information processing with limited
computational capacity [Koch and Ullman, 1987]. This kind of selectivity acts
as an 'Information Bottleneck (IB)', which seeks a trade-off between
information compression and predictive accuracy. However, such information
constraints are rarely explored in the attention mechanism for deep neural
networks (DNNs). In this paper, we propose an IB-inspired spatial attention
module for DNN structures built for visual recognition. The module takes as
input an intermediate representation of the input image, and outputs a
variational 2D attention map that minimizes the mutual information (MI) between
the attention-modulated representation and the input, while maximizing the MI
between the attention-modulated representation and the task label. To further
restrict the information bypassed by the attention map, we quantize the
continuous attention scores to a set of learnable anchor values during
training. Extensive experiments show that the proposed IB-inspired spatial
attention mechanism can yield attention maps that neatly highlight the regions
of interest while suppressing backgrounds, and bootstrap standard DNN
structures for visual recognition tasks (e.g., image classification,
fine-grained recognition, cross-domain classification). The attention maps are
interpretable for the decision making of the DNNs as verified in the
experiments. Our code is available at https://github.com/ashleylqx/AIB.git.
Related papers
- Spatial-Temporal Attention Network for Open-Set Fine-Grained Image
Recognition [14.450381668547259]
A vision transformer with the spatial self-attention mechanism could not learn accurate attention maps for distinguishing different categories of fine-grained images.
We propose a spatial-temporal attention network for learning fine-grained feature representations, called STAN.
The proposed STAN-OSFGR outperforms 9 state-of-the-art open-set recognition methods significantly in most cases.
arXiv Detail & Related papers (2022-11-25T07:46:42Z) - Where to Look: A Unified Attention Model for Visual Recognition with
Reinforcement Learning [5.247711598719703]
We propose to unify the top-down and bottom-up attention together for recurrent visual attention.
Our model exploits the image pyramids and Q-learning to select regions of interests in the top-down attention mechanism.
We train our model in an end-to-end reinforcement learning framework, and evaluate our method on visual classification tasks.
arXiv Detail & Related papers (2021-11-13T18:44:50Z) - Learning to ignore: rethinking attention in CNNs [87.01305532842878]
We propose to reformulate the attention mechanism in CNNs to learn to ignore instead of learning to attend.
Specifically, we propose to explicitly learn irrelevant information in the scene and suppress it in the produced representation.
arXiv Detail & Related papers (2021-11-10T13:47:37Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Coordinate Attention for Efficient Mobile Network Design [96.40415345942186]
We propose a novel attention mechanism for mobile networks by embedding positional information into channel attention.
Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes.
Our coordinate attention is beneficial to ImageNet classification and behaves better in down-stream tasks, such as object detection and semantic segmentation.
arXiv Detail & Related papers (2021-03-04T09:18:02Z) - Rotate to Attend: Convolutional Triplet Attention Module [21.228370317693244]
We present triplet attention, a novel method for computing attention weights using a three-branch structure.
Our method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module.
We demonstrate the effectiveness of our method on various challenging tasks including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets.
arXiv Detail & Related papers (2020-10-06T21:31:00Z) - Neural encoding with visual attention [17.020869686284165]
We propose a novel approach to neural encoding by including a trainable soft-attention module.
We find that attention locations estimated by the model on independent data agree well with the corresponding eye fixation patterns.
arXiv Detail & Related papers (2020-10-01T16:04:21Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.