Bayesian Attention Belief Networks
- URL: http://arxiv.org/abs/2106.05251v1
- Date: Wed, 9 Jun 2021 17:46:22 GMT
- Title: Bayesian Attention Belief Networks
- Authors: Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou
- Abstract summary: Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
- Score: 59.183311769616466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention-based neural networks have achieved state-of-the-art results on a
wide range of tasks. Most such models use deterministic attention while
stochastic attention is less explored due to the optimization difficulties or
complicated model design. This paper introduces Bayesian attention belief
networks, which construct a decoder network by modeling unnormalized attention
weights with a hierarchy of gamma distributions, and an encoder network by
stacking Weibull distributions with a deterministic-upward-stochastic-downward
structure to approximate the posterior. The resulting auto-encoding networks
can be optimized in a differentiable way with a variational lower bound. It is
simple to convert any models with deterministic attention, including pretrained
ones, to the proposed Bayesian attention belief networks. On a variety of
language understanding tasks, we show that our method outperforms deterministic
attention and state-of-the-art stochastic attention in accuracy, uncertainty
estimation, generalization across domains, and robustness to adversarial
attacks. We further demonstrate the general applicability of our method on
neural machine translation and visual question answering, showing great
potential of incorporating our method into various attention-related tasks.
Related papers
- GASE: Graph Attention Sampling with Edges Fusion for Solving Vehicle Routing Problems [6.084414764415137]
We propose an adaptive Graph Attention Sampling with the Edges Fusion framework to solve vehicle routing problems.
Our proposed model outperforms the existing methods by 2.08%-6.23% and shows stronger generalization ability.
arXiv Detail & Related papers (2024-05-21T03:33:07Z) - Invariant Causal Mechanisms through Distribution Matching [86.07327840293894]
In this work we provide a causal perspective and a new algorithm for learning invariant representations.
Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization.
arXiv Detail & Related papers (2022-06-23T12:06:54Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Bayesian Attention Modules [65.52970388117923]
We propose a scalable version of attention that is easy to implement and optimize.
Our experiments show the proposed method brings consistent improvements over the corresponding baselines.
arXiv Detail & Related papers (2020-10-20T20:30:55Z) - Gaussian Constrained Attention Network for Scene Text Recognition [16.485898019983797]
We argue that the existing attention mechanism faces the problem of attention diffusion, in which the model may not focus on a certain character area.
We propose a 2D attention-based method integrated with a novel Gaussian Constrained Refinement Module.
In this way, the attention weights will be more concentrated and the attention-based recognition network achieves better performance.
arXiv Detail & Related papers (2020-10-19T01:55:30Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.