Related papers: Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

URL: http://arxiv.org/abs/2206.02369v1
Date: Mon, 6 Jun 2022 05:51:12 GMT
Title: Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
Authors: Jin Xu, Xiaojiang Liu, Jianhao Yan, Deng Cai, Huayang Li, Jian Li
Abstract summary: We study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. We propose a training method where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data.
Score: 41.3948101212288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in human corpora (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. Through our quantitative experiments, we find that 1) Language models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data. Although our method is motivated by mitigating repetitions, experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.

Related papers

Understanding the Repeat Curse in Large Language Models from a Feature Perspective [10.413608338398785]
Large language models (LLMs) often suffer from repetitive text generation.<n>We propose a novel approach, "Duplicatus Charm", to induce and analyze the Repeat Curse.
arXiv Detail & Related papers (2025-04-19T07:53:37Z)
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z)
Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation [92.42032403795879]
We show that pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts. We attribute their overestimation of token-level repetition probabilities to the learning bias. We find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
arXiv Detail & Related papers (2023-07-04T07:53:55Z)
Generating Repetitions with Appropriate Repeated Words [30.10429353715689]
Repetitions are essential in communication to build trust with others. To the best of our knowledge, this is the first neural approach to address repetition generation. We propose Weighted Label Smoothing, a smoothing method for explicitly learning which words to repeat during fine-tuning, and a repetition scoring method that can output more appropriate repetitions during decoding.
arXiv Detail & Related papers (2022-07-03T01:21:49Z)
Taming Repetition in Dialogue Generation [1.851321027703742]
Inappropriate repetition of words can significantly degrade the quality of the generated texts. We design a context-aware classifier to explicitly decide when to allow repetition and when to employ penalized sampling. Our method can generate higher quality and more authentic dialogues.
arXiv Detail & Related papers (2021-12-16T06:25:46Z)
Using BERT Encoding and Sentence-Level Language Model for Sentence Ordering [0.9134244356393667]
We propose an algorithm for sentence ordering in a corpus of short stories. Our proposed method uses a language model based on Universal Transformers (UT) that captures sentences' dependencies by employing an attention mechanism. The proposed model includes three components: Sentence, Language Model, and Sentence Arrangement with Brute Force Search.
arXiv Detail & Related papers (2021-08-24T23:03:36Z)
A Theoretical Analysis of the Repetition Problem in Text Generation [55.8184629429347]
We show that the repetition problem is, unfortunately, caused by the traits of our language itself. One major reason is attributed to the fact that there exist too many words predicting the same word as the subsequent word with high probability. We propose a novel rebalanced encoding approach to alleviate the high inflow problem.
arXiv Detail & Related papers (2020-12-29T08:51:47Z)
Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding. Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
Fact-aware Sentence Split and Rephrase with Permutation Invariant Training [93.66323661321113]
Sentence Split and Rephrase aims to break down a complex sentence into several simple sentences with its meaning preserved. Previous studies tend to address the issue by seq2seq learning from parallel sentence pairs. We introduce Permutation Training to verifies the effects of order variance in seq2seq learning for this task.
arXiv Detail & Related papers (2020-01-16T07:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.