Variational Transformers for Diverse Response Generation
- URL: http://arxiv.org/abs/2003.12738v1
- Date: Sat, 28 Mar 2020 07:48:02 GMT
- Title: Variational Transformers for Diverse Response Generation
- Authors: Zhaojiang Lin, Genta Indra Winata, Peng Xu, Zihan Liu, Pascale Fung
- Abstract summary: Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
- Score: 71.53159402053392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the great promise of Transformers in many sequence modeling tasks
(e.g., machine translation), their deterministic nature hinders them from
generalizing to high entropy tasks such as dialogue response generation.
Previous work proposes to capture the variability of dialogue responses with a
recurrent neural network (RNN)-based conditional variational autoencoder
(CVAE). However, the autoregressive computation of the RNN limits the training
efficiency. Therefore, we propose the Variational Transformer (VT), a
variational self-attentive feed-forward sequence model. The VT combines the
parallelizability and global receptive field of the Transformer with the
variational nature of the CVAE by incorporating stochastic latent variables
into Transformers. We explore two types of the VT: 1) modeling the
discourse-level diversity with a global latent variable; and 2) augmenting the
Transformer decoder with a sequence of fine-grained latent variables. Then, the
proposed models are evaluated on three conversational datasets with both
automatic metric and human evaluation. The experimental results show that our
models improve standard Transformers and other baselines in terms of diversity,
semantic relevance, and human judgment.
Related papers
- Investigating Recurrent Transformers with Dynamic Halt [64.862738244735]
We study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism.
We propose and investigate novel ways to extend and combine the methods.
arXiv Detail & Related papers (2024-02-01T19:47:31Z) - GIVT: Generative Infinite-Vocabulary Transformers [18.55070896912795]
We introduce Generative Infinite-Vocabulary Transformers (GIVT) which generate vector sequences with real-valued entries.
Inspired by the image-generation paradigm of VQ-GAN and MaskGIT, we use GIVT to model the unquantized real-valued latent sequences of a $beta$-VAE.
In class-conditional image generation GIVT outperforms VQ-GAN as well as MaskGIT, and achieves performance competitive with recent latent diffusion models.
arXiv Detail & Related papers (2023-12-04T18:48:02Z) - Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation
via Hybrid Latent Variables [20.66743177460193]
We combine the merits of both continuous and discrete latent variables and propose a Hybrid Latent Variable (HLV) method.
HLV constrains the global semantics of responses through discrete latent variables and enriches responses with continuous latent variables.
In addition, we propose Conditional Hybrid Variational Transformer (CHVT) to construct and to utilize HLV with transformers for dialogue generation.
arXiv Detail & Related papers (2022-12-02T12:48:01Z) - Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation.
We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z) - Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation.
We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.
A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.