A Simple Contrastive Learning Objective for Alleviating Neural Text
Degeneration
- URL: http://arxiv.org/abs/2205.02517v1
- Date: Thu, 5 May 2022 08:50:50 GMT
- Title: A Simple Contrastive Learning Objective for Alleviating Neural Text
Degeneration
- Authors: Shaojie Jiang, Ruqing Zhang, Svitlana Vakulenko, Maarten de Rijke
- Abstract summary: We propose a new contrastive token learning objective that inherits the advantages of cross-entropy and unlikelihood training.
Comprehensive experiments on language modeling and open-domain dialogue generation tasks show that the proposed contrastive token objective yields less repetitive texts.
- Score: 56.64703901898937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The cross-entropy objective has proved to be an all-purpose training
objective for autoregressive language models (LMs). However, without
considering the penalization of problematic tokens, LMs trained using
cross-entropy exhibit text degeneration. To address this, unlikelihood training
has been proposed to force unlikely tokens to be assigned a low probability by
a LM. But unlikelihood does not consider the relationship between the label
tokens and the unlikely token candidates, thus showing marginal improvements in
degeneration. We propose a new contrastive token learning objective that
inherits the advantages of cross-entropy and unlikelihood training and avoids
their limitations. The key idea is to force a LM to generate high probabilities
for label tokens at each step while low probabilities of negative candidates.
Comprehensive experiments on language modeling and open-domain dialogue
generation tasks show that the proposed contrastive token objective yields less
repetitive texts, with a higher generation quality than unlikelihood training,
achieving the new state-of-the-art performance.
Related papers
- Language Model Pre-training on True Negatives [109.73819321246062]
Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones.
Existing PLMs simply treat all corrupted texts as equal negative without any examination.
We design enhanced pre-training methods to counteract false negative predictions and encourage pre-training language models on true negatives.
arXiv Detail & Related papers (2022-12-01T12:24:19Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Generative or Contrastive? Phrase Reconstruction for Better Sentence
Representation Learning [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning may yield powerful enough sentence representation and achieve performance in Sentence Textual Similarity tasks on par with contrastive learning.
arXiv Detail & Related papers (2022-04-20T10:00:46Z) - A Contrastive Framework for Neural Text Generation [46.845997620234265]
We show that an underlying reason for model degeneration is the anisotropic distribution of token representations.
We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text.
arXiv Detail & Related papers (2022-02-13T21:46:14Z) - Learning to Selectively Learn for Weakly-supervised Paraphrase
Generation [81.65399115750054]
We propose a novel approach to generate high-quality paraphrases with weak supervision data.
Specifically, we tackle the weakly-supervised paraphrase generation problem by:.
obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion.
We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
arXiv Detail & Related papers (2021-09-25T23:31:13Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - A Brief Study on the Effects of Training Generative Dialogue Models with
a Semantic loss [37.8626106992769]
We study the effects of minimizing an alternate training objective that fosters a model to generate alternate response and score it on semantic similarity.
We explore this idea on two different sized data sets on the task of next utterance generation in goal oriented dialogues.
arXiv Detail & Related papers (2021-06-20T04:39:29Z) - Diverse Keyphrase Generation with Neural Unlikelihood Training [6.645227801791013]
We study sequence-to-sequence (S2S) keyphrase generation models from the perspective of diversity.
We first analyze the extent of information redundancy present in the outputs generated by a baseline model trained using maximum likelihood estimation (MLE)
arXiv Detail & Related papers (2020-10-15T11:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.