Variational Structured Attention Networks for Deep Visual Representation
Learning
- URL: http://arxiv.org/abs/2103.03510v1
- Date: Fri, 5 Mar 2021 07:37:24 GMT
- Title: Variational Structured Attention Networks for Deep Visual Representation
Learning
- Authors: Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding,
Elisa Ricci
- Abstract summary: We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
- Score: 49.80498066480928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks have enabled major progress in addressing
pixel-level prediction tasks such as semantic segmentation, depth estimation,
surface normal prediction, and so on, benefiting from their powerful
capabilities in visual representation learning. Typically, state-of-the-art
models integrates attention mechanisms for improved deep feature
representations. Recently, some works have demonstrated the significance of
learning and combining both spatial- and channel-wise attentions for deep
feature refinement. In this paper, we aim at effectively boosting previous
approaches and propose a unified deep framework to jointly learn both spatial
attention maps and channel attention vectors in a principled manner so as to
structure the resulting attention tensors and model interactions between these
two types of attentions. Specifically, we integrate the estimation and the
interaction of the attentions within a probabilistic representation learning
framework, leading to Variational STructured Attention networks (VISTA-Net). We
implement the inference rules within the neural network, thus allowing for
end-to-end learning of the probabilistic and the CNN front-end parameters. As
demonstrated by our extensive empirical evaluation on six large-scale datasets
for dense visual prediction, VISTA-Net outperforms the state-of-the-art in
multiple continuous and discrete prediction tasks, thus confirming the benefit
of the proposed approach in joint structured spatial-channel attention
estimation for deep representation learning. The code is available at
https://github.com/ygjwd12345/VISTA-Net.
Related papers
- Influencer Detection with Dynamic Graph Neural Networks [56.1837101824783]
We investigate different dynamic Graph Neural Networks (GNNs) configurations for influencer detection.
We show that using deep multi-head attention in GNN and encoding temporal attributes significantly improves performance.
arXiv Detail & Related papers (2022-11-15T13:00:25Z) - SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained
Image Categorization [24.286426387100423]
We propose a method that captures subtle changes by aggregating context-aware features from most relevant image-regions.
Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs)
It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.
arXiv Detail & Related papers (2022-09-05T19:43:15Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Semantic Reinforced Attention Learning for Visual Place Recognition [15.84086970453363]
Large-scale visual place recognition (VPR) is inherently challenging because not all visual cues in the image are beneficial to the task.
We propose a novel Semantic Reinforced Attention Learning Network (SRALNet), in which the inferred attention can benefit from both semantic priors and data-driven fine-tuning.
Experiments demonstrate that our method outperforms state-of-the-art techniques on city-scale VPR benchmark datasets.
arXiv Detail & Related papers (2021-08-19T02:14:36Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z) - Probabilistic Graph Attention Network with Conditional Kernels for
Pixel-Wise Prediction [158.88345945211185]
We present a novel approach that advances the state of the art on pixel-level prediction in a fundamental aspect, i.e. structured multi-scale features learning and fusion.
We propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.
arXiv Detail & Related papers (2021-01-08T04:14:29Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.