Dior-CVAE: Pre-trained Language Models and Diffusion Priors for
Variational Dialog Generation
- URL: http://arxiv.org/abs/2305.15025v2
- Date: Mon, 23 Oct 2023 18:51:10 GMT
- Title: Dior-CVAE: Pre-trained Language Models and Diffusion Priors for
Variational Dialog Generation
- Authors: Tianyu Yang and Thy Thy Tran and Iryna Gurevych
- Abstract summary: Dior-CVAE is a hierarchical conditional variational autoencoder (CVAE) with diffusion priors to address these challenges.
We employ a diffusion model to increase the complexity of the prior distribution and its compatibility with the distributions produced by a PLM.
Experiments across two commonly used open-domain dialog datasets show that our method can generate more diverse responses without large-scale dialog pre-training.
- Score: 70.2283756542824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current variational dialog models have employed pre-trained language models
(PLMs) to parameterize the likelihood and posterior distributions. However, the
Gaussian assumption made on the prior distribution is incompatible with these
distributions, thus restricting the diversity of generated responses. These
models also suffer from posterior collapse, i.e., the decoder tends to ignore
latent variables and directly access information captured in the encoder
through the cross-attention mechanism. In this work, we propose Dior-CVAE, a
hierarchical conditional variational autoencoder (CVAE) with diffusion priors
to address these challenges. We employ a diffusion model to increase the
complexity of the prior distribution and its compatibility with the
distributions produced by a PLM. Also, we propose memory dropout to the
cross-attention mechanism, which actively encourages the use of latent
variables for response generation. Overall, experiments across two commonly
used open-domain dialog datasets show that our method can generate more diverse
responses without large-scale dialog pre-training. Code is available at
https://github.com/UKPLab/dior-cvae.
Related papers
- DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space [7.131920232495329]
In real-life conversations, the content is diverse, and there exists the one-to-many problem that requires diverse generation.
Previous studies attempted to introduce discrete or Gaussian-based continuous latent variables to address the one-to-many problem.
We propose DiffusionDialog, a novel approach to enhance the diversity of dialogue generation with the help of diffusion model.
arXiv Detail & Related papers (2024-04-10T05:56:46Z) - Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding [90.77521413857448]
Deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations.
We introduce Generalized generative adversarial-Decoding Diffusion Probabilistic Models (EDDPMs)
EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding.
Experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks.
arXiv Detail & Related papers (2024-02-29T10:08:57Z) - Diffusion models for probabilistic programming [56.47577824219207]
Diffusion Model Variational Inference (DMVI) is a novel method for automated approximate inference in probabilistic programming languages (PPLs)
DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model.
arXiv Detail & Related papers (2023-11-01T12:17:05Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Symmetric Equilibrium Learning of VAEs [56.56929742714685]
We view variational autoencoders (VAEs) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa.
We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling.
arXiv Detail & Related papers (2023-07-19T10:27:34Z) - LafitE: Latent Diffusion Model with Feature Editing for Unsupervised
Multi-class Anomaly Detection [12.596635603629725]
We develop a unified model to detect anomalies from objects belonging to multiple classes when only normal data is accessible.
We first explore the generative-based approach and investigate latent diffusion models for reconstruction.
We introduce a feature editing strategy that modifies the input feature space of the diffusion model to further alleviate identity shortcuts''
arXiv Detail & Related papers (2023-07-16T14:41:22Z) - Variational Diffusion Auto-encoder: Latent Space Extraction from
Pre-trained Diffusion Models [0.0]
Variational Auto-Encoders (VAEs) face challenges with the quality of generated images, often presenting noticeable blurriness.
This issue stems from the unrealistic assumption that approximates the conditional data distribution, $p(textbfx | textbfz)$, as an isotropic Gaussian.
We illustrate how one can extract a latent space from a pre-existing diffusion model by optimizing an encoder to maximize the marginal data log-likelihood.
arXiv Detail & Related papers (2023-04-24T14:44:47Z) - Generative Text Modeling through Short Run Inference [47.73892773331617]
The present work proposes a short run dynamics for inference. It is variation from the prior distribution of the latent variable and then runs a small number of Langevin dynamics steps guided by its posterior distribution.
We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse.
arXiv Detail & Related papers (2021-05-27T09:14:35Z) - To Regularize or Not To Regularize? The Bias Variance Trade-off in
Regularized AEs [10.611727286504994]
We study the effect of the latent prior on the generation deterministic quality of AE models.
We show that our model, called FlexAE, is the new state-of-the-art for the AE based generative models.
arXiv Detail & Related papers (2020-06-10T14:00:14Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.