Where to Look: A Unified Attention Model for Visual Recognition with
Reinforcement Learning
- URL: http://arxiv.org/abs/2111.07169v1
- Date: Sat, 13 Nov 2021 18:44:50 GMT
- Title: Where to Look: A Unified Attention Model for Visual Recognition with
Reinforcement Learning
- Authors: Gang Chen
- Abstract summary: We propose to unify the top-down and bottom-up attention together for recurrent visual attention.
Our model exploits the image pyramids and Q-learning to select regions of interests in the top-down attention mechanism.
We train our model in an end-to-end reinforcement learning framework, and evaluate our method on visual classification tasks.
- Score: 5.247711598719703
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The idea of using the recurrent neural network for visual attention has
gained popularity in computer vision community. Although the recurrent
attention model (RAM) leverages the glimpses with more large patch size to
increasing its scope, it may result in high variance and instability. For
example, we need the Gaussian policy with high variance to explore object of
interests in a large image, which may cause randomized search and unstable
learning. In this paper, we propose to unify the top-down and bottom-up
attention together for recurrent visual attention. Our model exploits the image
pyramids and Q-learning to select regions of interests in the top-down
attention mechanism, which in turn to guide the policy search in the bottom-up
approach. In addition, we add another two constraints over the bottom-up
recurrent neural networks for better exploration. We train our model in an
end-to-end reinforcement learning framework, and evaluate our method on visual
classification tasks. The experimental results outperform convolutional neural
networks (CNNs) baseline and the bottom-up recurrent attention models on visual
classification tasks.
Related papers
- Entity-Conditioned Question Generation for Robust Attention Distribution
in Neural Information Retrieval [51.53892300802014]
We show that supervised neural information retrieval models are prone to learning sparse attention patterns over passage tokens.
Using a novel targeted synthetic data generation method, we teach neural IR to attend more uniformly and robustly to all entities in a given passage.
arXiv Detail & Related papers (2022-04-24T22:36:48Z) - Visual Attention Network [90.0753726786985]
We propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention.
We also introduce a novel neural network based on LKA, namely Visual Attention Network (VAN)
VAN outperforms the state-of-the-art vision transformers and convolutional neural networks with a large margin in extensive experiments.
arXiv Detail & Related papers (2022-02-20T06:35:18Z) - Learning to ignore: rethinking attention in CNNs [87.01305532842878]
We propose to reformulate the attention mechanism in CNNs to learn to ignore instead of learning to attend.
Specifically, we propose to explicitly learn irrelevant information in the scene and suppress it in the produced representation.
arXiv Detail & Related papers (2021-11-10T13:47:37Z) - An Attention Module for Convolutional Neural Networks [5.333582981327498]
We propose an attention module for convolutional neural networks by developing an AW-convolution.
Experiments on several datasets for image classification and object detection tasks show the effectiveness of our proposed attention module.
arXiv Detail & Related papers (2021-08-18T15:36:18Z) - Deep Reinforcement Learning Models Predict Visual Responses in the
Brain: A Preliminary Result [1.0323063834827415]
We use reinforcement learning to train neural network models to play a 3D computer game.
We find that these reinforcement learning models achieve neural response prediction accuracy scores in the early visual areas.
In contrast, the supervised neural network models yield better neural response predictions in the higher visual areas.
arXiv Detail & Related papers (2021-06-18T13:10:06Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Unlocking Pixels for Reinforcement Learning via Implicit Attention [61.666538764049854]
We make use of new efficient attention algorithms, recently shown to be highly effective for Transformers.
This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches.
In addition, we propose a new efficient algorithm approximating softmax attention with what we call hybrid random features.
arXiv Detail & Related papers (2021-02-08T17:00:26Z) - Playing to distraction: towards a robust training of CNN classifiers
through visual explanation techniques [1.2321022105220707]
We present a novel and robust training scheme that integrates visual explanation techniques in the learning process.
In particular, we work on the challenging EgoFoodPlaces dataset, achieving state-of-the-art results with a lower level of complexity.
arXiv Detail & Related papers (2020-12-28T10:24:32Z) - Neural encoding with visual attention [17.020869686284165]
We propose a novel approach to neural encoding by including a trainable soft-attention module.
We find that attention locations estimated by the model on independent data agree well with the corresponding eye fixation patterns.
arXiv Detail & Related papers (2020-10-01T16:04:21Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.