Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation
- URL: http://arxiv.org/abs/2007.06018v1
- Date: Sun, 12 Jul 2020 15:31:24 GMT
- Title: Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation
- Authors: Yuxuan Song, Ning Miao, Hao Zhou, Lantao Yu, Mingxuan Wang, Lei Li
- Abstract summary: We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
- Score: 51.091890311312085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Auto-regressive sequence generative models trained by Maximum Likelihood
Estimation suffer the exposure bias problem in practical finite sample
scenarios. The crux is that the number of training samples for Maximum
Likelihood Estimation is usually limited and the input data distributions are
different at training and inference stages. Many method shave been proposed to
solve the above problem (Yu et al., 2017; Lu et al., 2018), which relies on
sampling from the non-stationary model distribution and suffers from high
variance or biased estimations. In this paper, we propose{\psi}-MLE, a new
training scheme for auto-regressive sequence generative models, which is
effective and stable when operating at large sample space encountered in text
generation. We derive our algorithm from a new perspective of self-augmentation
and introduce bias correction with density ratio estimation. Extensive
experimental results on synthetic data and real-world text generation tasks
demonstrate that our method stably outperforms Maximum Likelihood Estimation
and other state-of-the-art sequence generative models in terms of both quality
and diversity.
Related papers
- Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image
Synthesis [7.234618871984921]
An emerging area of research aims to learn deep generative models with limited training data.
We propose RS-IMLE, a novel approach that changes the prior distribution used for training.
This leads to substantially higher quality image generation compared to existing GAN and IMLE-based methods.
arXiv Detail & Related papers (2024-09-26T00:19:42Z) - Theoretical Guarantees of Data Augmented Last Layer Retraining Methods [5.352699766206809]
Linear last layer retraining strategies have been shown to achieve state-of-the-art performance for worst-group accuracy.
We present the optimal worst-group accuracy when modeling the distribution of the latent representations.
We evaluate and verify our results for both synthetic and large publicly available datasets.
arXiv Detail & Related papers (2024-05-09T17:16:54Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Balanced Training of Energy-Based Models with Adaptive Flow Sampling [13.951904929884618]
Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density.
We propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF)
Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times.
arXiv Detail & Related papers (2023-06-01T13:58:06Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.