Improving Variational Autoencoder for Text Modelling with Timestep-Wise
Regularisation
- URL: http://arxiv.org/abs/2011.01136v2
- Date: Tue, 3 Nov 2020 15:20:25 GMT
- Title: Improving Variational Autoencoder for Text Modelling with Timestep-Wise
Regularisation
- Authors: Ruizhe Li, Xiao Li, Guanyi Chen, Chenghua Lin
- Abstract summary: The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences.
However, an issue known as posterior collapse (or KL loss vanishing) happens when the VAE is used in text modelling.
We propose a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE) which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models.
- Score: 18.296350505386997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Variational Autoencoder (VAE) is a popular and powerful model applied to
text modelling to generate diverse sentences. However, an issue known as
posterior collapse (or KL loss vanishing) happens when the VAE is used in text
modelling, where the approximate posterior collapses to the prior, and the
model will totally ignore the latent variables and be degraded to a plain
language model during text generation. Such an issue is particularly prevalent
when RNN-based VAE models are employed for text modelling. In this paper, we
propose a simple, generic architecture called Timestep-Wise Regularisation VAE
(TWR-VAE), which can effectively avoid posterior collapse and can be applied to
any RNN-based VAE models. The effectiveness and versatility of our model are
demonstrated in different tasks, including language modelling and dialogue
response generation.
Related papers
- PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model [37.2192243883707]
We propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation to generate fluent text.
Results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text.
arXiv Detail & Related papers (2023-06-05T01:36:39Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - DiffusER: Discrete Diffusion via Edit-based Reconstruction [88.62707047517914]
DiffusER is an edit-based generative model for text based on denoising diffusion models.
It can rival autoregressive models on several tasks spanning machine translation, summarization, and style transfer.
It can also perform other varieties of generation that standard autoregressive models are not well-suited for.
arXiv Detail & Related papers (2022-10-30T16:55:23Z) - Evaluation of HTR models without Ground Truth Material [2.4792948967354236]
evaluation of Handwritten Text Recognition models during their development is straightforward.
But the evaluation process becomes tricky as soon as we switch from development to application.
We show that lexicon-based evaluation can compete with lexicon-based methods.
arXiv Detail & Related papers (2022-01-17T01:26:09Z) - Discrete Auto-regressive Variational Attention Models for Text Modeling [53.38382932162732]
Variational autoencoders (VAEs) have been widely applied for text modeling.
They are troubled by two challenges: information underrepresentation and posterior collapse.
We propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges.
arXiv Detail & Related papers (2021-06-16T06:36:26Z) - Generative Text Modeling through Short Run Inference [47.73892773331617]
The present work proposes a short run dynamics for inference. It is variation from the prior distribution of the latent variable and then runs a small number of Langevin dynamics steps guided by its posterior distribution.
We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse.
arXiv Detail & Related papers (2021-05-27T09:14:35Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.