Alignment Attention by Matching Key and Query Distributions
- URL: http://arxiv.org/abs/2110.12567v1
- Date: Mon, 25 Oct 2021 00:54:57 GMT
- Title: Alignment Attention by Matching Key and Query Distributions
- Authors: Shujian Zhang, Xinjie Fan, Huangjie Zheng, Korawat Tanwisuth, Mingyuan
Zhou
- Abstract summary: This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
- Score: 48.93793773929006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The neural attention mechanism has been incorporated into deep neural
networks to achieve state-of-the-art performance in various domains. Most such
models use multi-head self-attention which is appealing for the ability to
attend to information from different perspectives. This paper introduces
alignment attention that explicitly encourages self-attention to match the
distributions of the key and query within each head. The resulting alignment
attention networks can be optimized as an unsupervised regularization in the
existing attention framework. It is simple to convert any models with
self-attention, including pre-trained ones, to the proposed alignment
attention. On a variety of language understanding tasks, we show the
effectiveness of our method in accuracy, uncertainty estimation, generalization
across domains, and robustness to adversarial attacks. We further demonstrate
the general applicability of our approach on graph attention and visual
question answering, showing the great potential of incorporating our alignment
method into various attention-related tasks.
Related papers
- Improving Speech Emotion Recognition Through Focus and Calibration
Attention Mechanisms [0.5994412766684842]
We identify misalignments between the attention and the signal amplitude in the existing multi-head self-attention.
We propose to use a Focus-Attention (FA) mechanism and a novel-Attention (CA) mechanism in combination with the multi-head self-attention.
By employing the CA mechanism, the network can modulate the information flow by assigning different weights to each attention head and improve the utilization of surrounding contexts.
arXiv Detail & Related papers (2022-08-21T08:04:22Z) - Counterfactual Attention Learning for Fine-Grained Visual Categorization
and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference.
Specifically, we analyze the effect of the learned visual attention on network prediction.
We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Repulsive Attention: Rethinking Multi-head Attention as Bayesian
Inference [68.12511526813991]
We provide a novel understanding of multi-head attention from a Bayesian perspective.
We propose a non-parametric approach that explicitly improves the repulsiveness in multi-head attention.
Experiments on various attention models and applications demonstrate that the proposed repulsive attention can improve the learned feature diversity.
arXiv Detail & Related papers (2020-09-20T06:32:23Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.