Related papers: A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration

A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration

URL: http://arxiv.org/abs/2205.02517v1
Date: Thu, 5 May 2022 08:50:50 GMT
Title: A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration
Authors: Shaojie Jiang, Ruqing Zhang, Svitlana Vakulenko, Maarten de Rijke
Abstract summary: We propose a new contrastive token learning objective that inherits the advantages of cross-entropy and unlikelihood training. Comprehensive experiments on language modeling and open-domain dialogue generation tasks show that the proposed contrastive token objective yields less repetitive texts.
Score: 56.64703901898937
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The cross-entropy objective has proved to be an all-purpose training objective for autoregressive language models (LMs). However, without considering the penalization of problematic tokens, LMs trained using cross-entropy exhibit text degeneration. To address this, unlikelihood training has been proposed to force unlikely tokens to be assigned a low probability by a LM. But unlikelihood does not consider the relationship between the label tokens and the unlikely token candidates, thus showing marginal improvements in degeneration. We propose a new contrastive token learning objective that inherits the advantages of cross-entropy and unlikelihood training and avoids their limitations. The key idea is to force a LM to generate high probabilities for label tokens at each step while low probabilities of negative candidates. Comprehensive experiments on language modeling and open-domain dialogue generation tasks show that the proposed contrastive token objective yields less repetitive texts, with a higher generation quality than unlikelihood training, achieving the new state-of-the-art performance.

Related papers

HNCSE: Advancing Sentence Embeddings via Hybrid Contrastive Learning with Hard Negatives [17.654412302780557]
HNCSE is a novel contrastive learning framework that extends the leading SimCSE approach. The hallmark of HNCSE is its innovative use of hard negative samples to enhance the learning of both positive and negative samples.
arXiv Detail & Related papers (2024-11-19T01:26:20Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models. It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Language Model Pre-training on True Negatives [109.73819321246062]
Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones. Existing PLMs simply treat all corrupted texts as equal negative without any examination. We design enhanced pre-training methods to counteract false negative predictions and encourage pre-training language models on true negatives.
arXiv Detail & Related papers (2022-12-01T12:24:19Z)
Sentence Representation Learning with Generative Objective rather than Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction. Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z)
Generative or Contrastive? Phrase Reconstruction for Better Sentence Representation Learning [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction. Our generative learning may yield powerful enough sentence representation and achieve performance in Sentence Textual Similarity tasks on par with contrastive learning.
arXiv Detail & Related papers (2022-04-20T10:00:46Z)
A Contrastive Framework for Neural Text Generation [46.845997620234265]
We show that an underlying reason for model degeneration is the anisotropic distribution of token representations. We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text.
arXiv Detail & Related papers (2022-02-13T21:46:14Z)
Learning to Selectively Learn for Weakly-supervised Paraphrase Generation [81.65399115750054]
We propose a novel approach to generate high-quality paraphrases with weak supervision data. Specifically, we tackle the weakly-supervised paraphrase generation problem by:. obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion. We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
arXiv Detail & Related papers (2021-09-25T23:31:13Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
A Brief Study on the Effects of Training Generative Dialogue Models with a Semantic loss [37.8626106992769]
We study the effects of minimizing an alternate training objective that fosters a model to generate alternate response and score it on semantic similarity. We explore this idea on two different sized data sets on the task of next utterance generation in goal oriented dialogues.
arXiv Detail & Related papers (2021-06-20T04:39:29Z)
Diverse Keyphrase Generation with Neural Unlikelihood Training [6.645227801791013]
We study sequence-to-sequence (S2S) keyphrase generation models from the perspective of diversity. We first analyze the extent of information redundancy present in the outputs generated by a baseline model trained using maximum likelihood estimation (MLE)
arXiv Detail & Related papers (2020-10-15T11:12:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.