Residual Energy-Based Models for Text
- URL: http://arxiv.org/abs/2004.10188v2
- Date: Mon, 21 Dec 2020 15:50:36 GMT
- Title: Residual Energy-Based Models for Text
- Authors: Anton Bakhtin and Yuntian Deng and Sam Gross and Myle Ott and
Marc'Aurelio Ranzato and Arthur Szlam
- Abstract summary: We show that the generations of auto-regressive language models can be reliably distinguished from real text by statistical discriminators.
This suggests that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process.
- Score: 46.22375671394882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current large-scale auto-regressive language models display impressive
fluency and can generate convincing text. In this work we start by asking the
question: Can the generations of these models be reliably distinguished from
real text by statistical discriminators? We find experimentally that the answer
is affirmative when we have access to the training data for the model, and
guardedly affirmative even if we do not.
This suggests that the auto-regressive models can be improved by
incorporating the (globally normalized) discriminators into the generative
process. We give a formalism for this using the Energy-Based Model framework,
and show that it indeed improves the results of the generative models, measured
both in terms of perplexity and in terms of human evaluation.
Related papers
- Transcendence: Generative Models Can Outperform The Experts That Train Them [55.885802048647655]
We study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data.
We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset.
arXiv Detail & Related papers (2024-06-17T17:00:52Z) - SelfEval: Leveraging the discriminative nature of generative models for
evaluation [35.7242199928684]
We show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities.
Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts.
arXiv Detail & Related papers (2023-11-17T18:58:16Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Adapting a Language Model for Controlled Affective Text Generation [2.9267797650223653]
We adapt the state-of-the-art language generation models to generate affective (emotional) text.
We propose to incorporate emotion as prior for the probabilistic state-of-the-art text generation model such as GPT-2.
The model gives a user the flexibility to control the category and intensity of emotion as well as the topic of the generated text.
arXiv Detail & Related papers (2020-11-08T15:24:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.