Generative Text Modeling through Short Run Inference
- URL: http://arxiv.org/abs/2106.02513v2
- Date: Tue, 8 Jun 2021 09:55:55 GMT
- Title: Generative Text Modeling through Short Run Inference
- Authors: Bo Pang, Erik Nijkamp, Tian Han, Ying Nian Wu
- Abstract summary: The present work proposes a short run dynamics for inference. It is variation from the prior distribution of the latent variable and then runs a small number of Langevin dynamics steps guided by its posterior distribution.
We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse.
- Score: 47.73892773331617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent variable models for text, when trained successfully, accurately model
the data distribution and capture global semantic and syntactic features of
sentences. The prominent approach to train such models is variational
autoencoders (VAE). It is nevertheless challenging to train and often results
in a trivial local optimum where the latent variable is ignored and its
posterior collapses into the prior, an issue known as posterior collapse.
Various techniques have been proposed to mitigate this issue. Most of them
focus on improving the inference model to yield latent codes of higher quality.
The present work proposes a short run dynamics for inference. It is initialized
from the prior distribution of the latent variable and then runs a small number
(e.g., 20) of Langevin dynamics steps guided by its posterior distribution. The
major advantage of our method is that it does not require a separate inference
model or assume simple geometry of the posterior distribution, thus rendering
an automatic, natural and flexible inference engine. We show that the models
trained with short run dynamics more accurately model the data, compared to
strong language model and VAE baselines, and exhibit no sign of posterior
collapse. Analyses of the latent space show that interpolation in the latent
space is able to generate coherent sentences with smooth transition and
demonstrate improved classification over strong baselines with latent features
from unsupervised pretraining. These results together expose a well-structured
latent space of our generative model.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.
We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Towards Model-Agnostic Posterior Approximation for Fast and Accurate Variational Autoencoders [22.77397537980102]
We show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model's posterior.
We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines.
arXiv Detail & Related papers (2024-03-13T20:16:21Z) - PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model [37.2192243883707]
We propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation to generate fluent text.
Results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text.
arXiv Detail & Related papers (2023-06-05T01:36:39Z) - Learning Sparse Latent Representations for Generator Model [7.467412443287767]
We present a new unsupervised learning method to enforce sparsity on the latent space for the generator model.
Our model consists of only one top-down generator network that maps the latent variable to the observed data.
arXiv Detail & Related papers (2022-09-20T18:58:24Z) - A Sparsity-promoting Dictionary Model for Variational Autoencoders [16.61511959679188]
Structuring the latent space in deep generative models is important to yield more expressive models and interpretable representations.
We propose a simple yet effective methodology to structure the latent space via a sparsity-promoting dictionary model.
arXiv Detail & Related papers (2022-03-29T17:13:11Z) - Discrete Auto-regressive Variational Attention Models for Text Modeling [53.38382932162732]
Variational autoencoders (VAEs) have been widely applied for text modeling.
They are troubled by two challenges: information underrepresentation and posterior collapse.
We propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges.
arXiv Detail & Related papers (2021-06-16T06:36:26Z) - Preventing Posterior Collapse with Levenshtein Variational Autoencoder [61.30283661804425]
We propose to replace the evidence lower bound (ELBO) with a new objective which is simple to optimize and prevents posterior collapse.
We show that Levenstein VAE produces more informative latent representations than alternative approaches to preventing posterior collapse.
arXiv Detail & Related papers (2020-04-30T13:27:26Z) - Discrete Variational Attention Models for Language Generation [51.88612022940496]
We propose a discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages.
Thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse.
arXiv Detail & Related papers (2020-04-21T05:49:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.