TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in
CNNs
- URL: http://arxiv.org/abs/2111.13470v1
- Date: Fri, 26 Nov 2021 12:35:17 GMT
- Title: TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in
CNNs
- Authors: Shantanu Jaiswal, Basura Fernando, Cheston Tan
- Abstract summary: We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs.
Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
- Score: 18.24779045808196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention modules for Convolutional Neural Networks (CNNs) are an effective
method to enhance performance of networks on multiple computer-vision tasks.
While many works focus on building more effective modules through appropriate
modelling of channel-, spatial- and self-attention, they primarily operate in a
feedfoward manner. Consequently, the attention mechanism strongly depends on
the representational capacity of a single input feature activation, and can
benefit from incorporation of semantically richer higher-level activations that
can specify "what and where to look" through top-down information flow. Such
feedback connections are also prevalent in the primate visual cortex and
recognized by neuroscientists as a key component in primate visual attention.
Accordingly, in this work, we propose a lightweight top-down (TD) attention
module that iteratively generates a "visual searchlight" to perform top-down
channel and spatial modulation of its inputs and consequently outputs more
selective feature activations at each computation step. Our experiments
indicate that integrating TD in CNNs enhances their performance on ImageNet-1k
classification and outperforms prominent attention modules while being more
parameter and memory efficient. Further, our models are more robust to changes
in input resolution during inference and learn to "shift attention" by
localizing individual objects or features at each computation step without any
explicit supervision. This capability results in 5% improvement for ResNet50 on
weakly-supervised object localization besides improvements in fine-grained and
multi-label classification.
Related papers
- ELA: Efficient Local Attention for Deep Convolutional Neural Networks [15.976475674061287]
This paper introduces an Efficient Local Attention (ELA) method that achieves substantial performance improvements with a simple structure.
To overcome these challenges, we propose the incorporation of 1D convolution and Group Normalization feature enhancement techniques.
ELA can be seamlessly integrated into deep CNN networks such as ResNet, MobileNet, and DeepLab.
arXiv Detail & Related papers (2024-03-02T08:06:18Z) - DAT++: Spatially Dynamic Vision Transformer with Deformable Attention [87.41016963608067]
We present Deformable Attention Transformer ( DAT++), a vision backbone efficient and effective for visual recognition.
DAT++ achieves state-of-the-art results on various visual recognition benchmarks, with 85.9% ImageNet accuracy, 54.5 and 47.0 MS-COCO instance segmentation mAP, and 51.5 ADE20K semantic segmentation mIoU.
arXiv Detail & Related papers (2023-09-04T08:26:47Z) - Influencer Detection with Dynamic Graph Neural Networks [56.1837101824783]
We investigate different dynamic Graph Neural Networks (GNNs) configurations for influencer detection.
We show that using deep multi-head attention in GNN and encoding temporal attributes significantly improves performance.
arXiv Detail & Related papers (2022-11-15T13:00:25Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Activating More Pixels in Image Super-Resolution Transformer [53.87533738125943]
Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution.
We propose a novel Hybrid Attention Transformer (HAT) to activate more input pixels for better reconstruction.
Our overall method significantly outperforms the state-of-the-art methods by more than 1dB.
arXiv Detail & Related papers (2022-05-09T17:36:58Z) - An Attention Module for Convolutional Neural Networks [5.333582981327498]
We propose an attention module for convolutional neural networks by developing an AW-convolution.
Experiments on several datasets for image classification and object detection tasks show the effectiveness of our proposed attention module.
arXiv Detail & Related papers (2021-08-18T15:36:18Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Multi-stage Attention ResU-Net for Semantic Segmentation of
Fine-Resolution Remote Sensing Images [9.398340832493457]
We propose a Linear Attention Mechanism (LAM) to address this issue.
LAM is approximately equivalent to dot-product attention with computational efficiency.
We design a Multi-stage Attention ResU-Net for semantic segmentation from fine-resolution remote sensing images.
arXiv Detail & Related papers (2020-11-29T07:24:21Z) - Efficient Attention Network: Accelerate Attention by Searching Where to
Plug [11.616720452770322]
We propose a framework called Efficient Attention Network (EAN) to improve the efficiency for the existing attention modules.
In EAN, we leverage the sharing mechanism to share the attention module within the backbone and search where to connect the shared attention module via reinforcement learning.
Experiments on widely-used benchmarks and popular attention networks show the effectiveness of EAN.
arXiv Detail & Related papers (2020-11-28T03:31:08Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.