Discrete Variational Attention Models for Language Generation
- URL: http://arxiv.org/abs/2004.09764v4
- Date: Wed, 16 Jun 2021 06:35:02 GMT
- Title: Discrete Variational Attention Models for Language Generation
- Authors: Xianghong Fang and Haoli Bai and Zenglin Xu and Michael Lyu and Irwin
King
- Abstract summary: We propose a discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages.
Thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse.
- Score: 51.88612022940496
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Variational autoencoders have been widely applied for natural language
generation, however, there are two long-standing problems: information
under-representation and posterior collapse. The former arises from the fact
that only the last hidden state from the encoder is transformed to the latent
space, which is insufficient to summarize data. The latter comes as a result of
the imbalanced scale between the reconstruction loss and the KL divergence in
the objective function. To tackle these issues, in this paper we propose the
discrete variational attention model with categorical distribution over the
attention mechanism owing to the discrete nature in languages. Our approach is
combined with an auto-regressive prior to capture the sequential dependency
from observations, which can enhance the latent space for language generation.
Moreover, thanks to the property of discreteness, the training of our proposed
approach does not suffer from posterior collapse. Furthermore, we carefully
analyze the superiority of discrete latent space over the continuous space with
the common Gaussian distribution. Extensive experiments on language generation
demonstrate superior advantages of our proposed approach in comparison with the
state-of-the-art counterparts.
Related papers
- Variational excess risk bound for general state space models [0.0]
We consider variational autoencoders (VAE) for general state space models.
We consider a backward factorization of the variational distributions to analyze the excess risk associated with VAE.
arXiv Detail & Related papers (2023-12-15T08:41:07Z) - Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks [10.880057430629126]
Disentangled latent spaces usually have better semantic separability and geometrical properties, which leads to better interpretability and more controllable data generation.
In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features.
We introduce a flow-based invertible neural network (INN) mechanism integrated with a transformer-based language Autoencoder (AE) in order to deliver latent spaces with better separability properties.
arXiv Detail & Related papers (2023-05-02T18:27:13Z) - Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation [0.0]
Non-autoregressive neural machine translation (NAT) offers substantial translation speed up compared to autoregressive neural machine translation (AT)
Latent variable modeling has emerged as a promising approach to bridge this quality gap.
arXiv Detail & Related papers (2023-05-02T15:33:09Z) - Improving Variational Autoencoders with Density Gap-based Regularization [16.770753948524167]
Variational autoencoders (VAEs) are one of the powerful unsupervised learning frameworks in NLP for latent representation learning and latent-directed generation.
In practice, optimizing ELBo often leads the posterior distribution of all samples converge to the same degenerated local optimum, namely posterior collapse or KL vanishing.
We introduce new training objectives to tackle both problems through a novel regularization based on the probabilistic density gap between the aggregated posterior distribution and the prior distribution.
arXiv Detail & Related papers (2022-11-01T08:17:10Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Discrete Auto-regressive Variational Attention Models for Text Modeling [53.38382932162732]
Variational autoencoders (VAEs) have been widely applied for text modeling.
They are troubled by two challenges: information underrepresentation and posterior collapse.
We propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges.
arXiv Detail & Related papers (2021-06-16T06:36:26Z) - Generative Text Modeling through Short Run Inference [47.73892773331617]
The present work proposes a short run dynamics for inference. It is variation from the prior distribution of the latent variable and then runs a small number of Langevin dynamics steps guided by its posterior distribution.
We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse.
arXiv Detail & Related papers (2021-05-27T09:14:35Z) - APo-VAE: Text Generation in Hyperbolic Space [116.11974607497986]
In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.
An Adrial Poincare Variversaational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions.
Experiments in language modeling and dialog-response generation tasks demonstrate the winning effectiveness of the proposed APo-VAE model.
arXiv Detail & Related papers (2020-04-30T19:05:41Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.