Related papers: On the Discrepancy between Density Estimation and Sequence Generation

On the Discrepancy between Density Estimation and Sequence Generation

URL: http://arxiv.org/abs/2002.07233v1
Date: Mon, 17 Feb 2020 20:13:35 GMT
Title: On the Discrepancy between Density Estimation and Sequence Generation
Authors: Jason Lee, Dustin Tran, Orhan Firat, Kyunghyun Cho
Abstract summary: log-likelihood is highly correlated with BLEU when we consider models within the same family. We observe no correlation between rankings of models across different families.
Score: 92.70116082182076
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many sequence-to-sequence generation tasks, including machine translation and text-to-speech, can be posed as estimating the density of the output y given the input x: p(y|x). Given this interpretation, it is natural to evaluate sequence-to-sequence models using conditional log-likelihood on a test set. However, the goal of sequence-to-sequence generation (or structured prediction) is to find the best output y^ given an input x, and each task has its own downstream metric R that scores a model output by comparing against a set of references y*: R(y^, y* | x). While we hope that a model that excels in density estimation also performs well on the downstream metric, the exact correlation has not been studied for sequence generation tasks. In this paper, by comparing several density estimators on five machine translation tasks, we find that the correlation between rankings of models based on log-likelihood and BLEU varies significantly depending on the range of the model families being compared. First, log-likelihood is highly correlated with BLEU when we consider models within the same family (e.g. autoregressive models, or latent variable models with the same parameterization of the prior). However, we observe no correlation between rankings of models across different families: (1) among non-autoregressive latent variable models, a flexible prior distribution is better at density estimation but gives worse generation quality than a simple prior, and (2) autoregressive models offer the best translation performance overall, while latent variable models with a normalizing flow prior give the highest held-out log-likelihood across all datasets. Therefore, we recommend using a simple prior for the latent variable non-autoregressive model when fast generation speed is desired.

Related papers

Local Flow Matching Generative Models [19.859984725284896]
Local Flow Matching is a computational framework for density estimation based on flow-based generative models. $textttLFM$ employs a simulation-free scheme and incrementally learns a sequence of Flow Matching sub-models. We demonstrate the improved training efficiency and competitive generative performance of $textttLFM$ compared to FM.
arXiv Detail & Related papers (2024-10-03T14:53:10Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)
On Maximum Likelihood Training of Score-Based Generative Models [17.05208572228308]
We show that an objective is equivalent to maximum likelihood for certain choices of mixture weighting. We show that both maximum likelihood training and test-time log-likelihood evaluation can be achieved through parameterization of the score function alone.
arXiv Detail & Related papers (2021-01-22T18:22:29Z)
Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores) For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Variational Mixture of Normalizing Flows [0.0]
Deep generative models, such as generative adversarial networks autociteGAN, variational autoencoders autocitevaepaper, and their variants, have seen wide adoption for the task of modelling complex data distributions. Normalizing flows have overcome this limitation by leveraging the change-of-suchs formula for probability density functions. The present work overcomes this by using normalizing flows as components in a mixture model and devising an end-to-end training procedure for such a model.
arXiv Detail & Related papers (2020-09-01T17:20:08Z)
Pattern Similarity-based Machine Learning Methods for Mid-term Load Forecasting: A Comparative Study [0.0]
We use pattern similarity-based methods for forecasting monthly electricity demand expressing annual seasonality. An integral part of the models is the time series representation using patterns of time series sequences. We consider four such models: nearest neighbor model, fuzzy neighborhood model, kernel regression model and general regression neural network.
arXiv Detail & Related papers (2020-03-03T12:14:36Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.