Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation
- URL: http://arxiv.org/abs/2210.12409v2
- Date: Wed, 26 Oct 2022 12:51:57 GMT
- Title: Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation
- Authors: Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie
- Abstract summary: Variational Auto-Encoder (VAE) has been widely adopted in text generation.
We propose TRACE, a Transformer-based recurrent VAE structure.
- Score: 85.5379146125199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variational Auto-Encoder (VAE) has been widely adopted in text generation.
Among many variants, recurrent VAE learns token-wise latent variables with each
conditioned on the preceding ones, which captures sequential variability better
in the era of RNN. However, it is unclear how to incorporate such recurrent
dynamics into the recently dominant Transformer due to its parallelism. In this
work, we propose TRACE, a Transformer-based recurrent VAE structure. TRACE
imposes recurrence on segment-wise latent variables with arbitrarily separated
text segments and constructs the posterior distribution with residual
parameterization. Besides, we design an acceleration method by approximating
idempotent matrices, which allows parallelism while maintaining the conditional
dependence of latent variables. We demonstrate that TRACE could enhance the
entanglement of each segment and preceding latent variables and deduce a
non-zero lower bound of the KL term, providing a theoretical guarantee of
generation diversity. Experiments on two unconditional and one conditional
generation tasks show that TRACE achieves significantly improved diversity
while maintaining satisfactory generation quality.
Related papers
- Protect Before Generate: Error Correcting Codes within Discrete Deep Generative Models [3.053842954605396]
We introduce a novel method that enhances variational inference in discrete latent variable models.
We leverage Error Correcting Codes (ECCs) to introduce redundancy in the latent representations.
This redundancy is then exploited by the variational posterior to yield more accurate estimates.
arXiv Detail & Related papers (2024-10-10T11:59:58Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation
via Hybrid Latent Variables [20.66743177460193]
We combine the merits of both continuous and discrete latent variables and propose a Hybrid Latent Variable (HLV) method.
HLV constrains the global semantics of responses through discrete latent variables and enriches responses with continuous latent variables.
In addition, we propose Conditional Hybrid Variational Transformer (CHVT) to construct and to utilize HLV with transformers for dialogue generation.
arXiv Detail & Related papers (2022-12-02T12:48:01Z) - Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent
Variable Inference for Text Generation [85.5379146125199]
We propose a novel variational Transformer framework to overcome the KL vanishing problem.
We show that our method can be regarded as entangling latent variables to avoid posterior information decrease through layers.
arXiv Detail & Related papers (2022-07-13T11:27:46Z) - Interpretable Latent Variables in Deep State Space Models [4.884336328409872]
We introduce a new version of deep state-space models (DSSMs) that combines a recurrent neural network with a state-space framework to forecast time series data.
The model estimates the observed series as functions of latent variables that evolve non-linearly through time.
arXiv Detail & Related papers (2022-03-03T23:10:58Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.