Residual Energy-Based Models for Text Generation
- URL: http://arxiv.org/abs/2004.11714v1
- Date: Wed, 22 Apr 2020 23:19:55 GMT
- Title: Residual Energy-Based Models for Text Generation
- Authors: Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio
Ranzato
- Abstract summary: We investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.
In order to make training tractable, we first work in the residual of a pretrained locally normalized language model and second we train using noise contrastive estimation.
Our experiments on two large language modeling datasets show that residual EBMs yield lower perplexity compared to locally normalized baselines.
- Score: 47.53354656462756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text generation is ubiquitous in many NLP tasks, from summarization, to
dialogue and machine translation. The dominant parametric approach is based on
locally normalized models which predict one word at a time. While these work
remarkably well, they are plagued by exposure bias due to the greedy nature of
the generation process. In this work, we investigate un-normalized energy-based
models (EBMs) which operate not at the token but at the sequence level. In
order to make training tractable, we first work in the residual of a pretrained
locally normalized language model and second we train using noise contrastive
estimation. Furthermore, since the EBM works at the sequence level, we can
leverage pretrained bi-directional contextual representations, such as BERT and
RoBERTa. Our experiments on two large language modeling datasets show that
residual EBMs yield lower perplexity compared to locally normalized baselines.
Moreover, generation via importance sampling is very efficient and of higher
quality than the baseline models according to human evaluation.
Related papers
- Leveraging Pre-trained Models for Failure Analysis Triplets Generation [0.0]
We leverage the attention mechanism of pre-trained causal language models such as Transformer model for the downstream task of generating Failure Analysis Triplets (FATs)
We observe that Generative Pre-trained Transformer 2 (GPT2) outperformed other transformer model for the failure analysis triplet generation (FATG) task.
In particular, we observe that GPT2 (trained on 1.5B parameters) outperforms pre-trained BERT, BART and GPT3 by a large margin on ROUGE.
arXiv Detail & Related papers (2022-10-31T17:21:15Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - End-to-End Training for Back-Translation with Categorical Reparameterization Trick [0.0]
Back-translation is an effective semi-supervised learning framework in neural machine translation (NMT)
A pre-trained NMT model translates monolingual sentences and makes synthetic bilingual sentence pairs for the training of the other NMT model.
The discrete property of translated sentences prevents information gradient from flowing between the two NMT models.
arXiv Detail & Related papers (2022-02-17T06:31:03Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - Joint Energy-based Model Training for Better Calibrated Natural Language
Understanding Models [61.768082640087]
We explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders for natural language understanding tasks.
Experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines.
arXiv Detail & Related papers (2021-01-18T01:41:31Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.