Related papers: Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation

Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation

URL: http://arxiv.org/abs/2307.01542v1
Date: Tue, 4 Jul 2023 07:53:55 GMT
Title: Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation
Authors: Jian Guan, Minlie Huang
Abstract summary: We show that pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts. We attribute their overestimation of token-level repetition probabilities to the learning bias. We find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
Score: 92.42032403795879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive training to penalize the output of a premature checkpoint of the same model when it incorrectly predicts repetition, which is shown to mitigate repetition effectively while maintaining fluency on two datasets. Furthermore, we find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.

Related papers

Understanding the Repeat Curse in Large Language Models from a Feature Perspective [10.413608338398785]
Large language models (LLMs) often suffer from repetitive text generation. We propose a novel approach, "Duplicatus Charm", to induce and analyze the Repeat Curse.
arXiv Detail & Related papers (2025-04-19T07:53:37Z)
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking [50.81324768683995]
FIRST is a novel approach that integrates a learning-to-rank objective and leveraging the logits of only the first generated token. We extend the evaluation of FIRST to the TREC Deep Learning datasets (DL19-22), validating its robustness across diverse domains. Our experiments confirm that fast reranking with single-token logits does not compromise out-of-domain reranking quality.
arXiv Detail & Related papers (2024-11-08T12:08:17Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
KNN-LM Does Not Improve Open-ended Text Generation [34.86733697757264]
We study the generation quality of retrieval-augmented language models (LMs) We find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM. We discover that the entropy of the retrieval distribution increases faster than that of the base LM as the generated sequence becomes longer.
arXiv Detail & Related papers (2023-05-24T01:48:33Z)
Joint Repetition Suppression and Content Moderation of Large Language Models [4.9990392459395725]
Natural language generation (NLG) is one of the most impactful fields in NLP. In this paper, we apply non-exact repetition suppression using token and sequence level unlikelihood loss. We also explore the framework of unlikelihood training objective in order to jointly endow the model with abilities to avoid generating offensive words.
arXiv Detail & Related papers (2023-04-20T19:17:49Z)
Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation [41.3948101212288]
We study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. We propose a training method where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data.
arXiv Detail & Related papers (2022-06-06T05:51:12Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation [38.123025955523836]
Non-autoregressive neural machine translation (NAT) predicts the entire target sequence simultaneously and significantly accelerates inference process. We propose a novel semi-autoregressive model RecoverSAT, which generates a translation as a sequence of segments. By dynamically determining segment length and repetitive deleting segments, RecoverSAT is capable of recovering from repetitive and missing token errors. Experimental results on three widely-used benchmark datasets show that our proposed model achieves more than 4$times$ speedup while maintaining comparable performance compared with the corresponding autoregressive model.
arXiv Detail & Related papers (2020-06-09T10:12:16Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.