Learning to Break the Loop: Analyzing and Mitigating Repetitions for
Neural Text Generation
- URL: http://arxiv.org/abs/2206.02369v1
- Date: Mon, 6 Jun 2022 05:51:12 GMT
- Title: Learning to Break the Loop: Analyzing and Mitigating Repetitions for
Neural Text Generation
- Authors: Jin Xu, Xiaojiang Liu, Jianhao Yan, Deng Cai, Huayang Li, Jian Li
- Abstract summary: We study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context.
We propose a training method where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data.
- Score: 41.3948101212288
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While large-scale neural language models, such as GPT2 and BART, have
achieved impressive results on various text generation tasks, they tend to get
stuck in undesirable sentence-level loops with maximization-based decoding
algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive
since there are few consecutive sentence-level repetitions in human corpora
(e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for
generating consecutive sentence-level repetitions, we study the relationship
between the probabilities of the repetitive tokens and their previous
repetitions in the context. Through our quantitative experiments, we find that
1) Language models have a preference to repeat the previous sentence; 2) The
sentence-level repetitions have a \textit{self-reinforcement effect}: the more
times a sentence is repeated in the context, the higher the probability of
continuing to generate that sentence; 3) The sentences with higher initial
probabilities usually have a stronger self-reinforcement effect. Motivated by
our findings, we propose a simple and effective training method \textbf{DITTO}
(Pseu\underline{D}o-Repet\underline{IT}ion
Penaliza\underline{T}i\underline{O}n), where the model learns to penalize
probabilities of sentence-level repetitions from pseudo repetitive data.
Although our method is motivated by mitigating repetitions, experiments show
that DITTO not only mitigates the repetition issue without sacrificing
perplexity, but also achieves better generation quality. Extensive experiments
on open-ended text generation (Wikitext-103) and text summarization
(CNN/DailyMail) demonstrate the generality and effectiveness of our method.
Related papers
- Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z) - Mitigating the Learning Bias towards Repetition by Self-Contrastive
Training for Open-Ended Generation [92.42032403795879]
We show that pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts.
We attribute their overestimation of token-level repetition probabilities to the learning bias.
We find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
arXiv Detail & Related papers (2023-07-04T07:53:55Z) - Generating Repetitions with Appropriate Repeated Words [30.10429353715689]
Repetitions are essential in communication to build trust with others.
To the best of our knowledge, this is the first neural approach to address repetition generation.
We propose Weighted Label Smoothing, a smoothing method for explicitly learning which words to repeat during fine-tuning, and a repetition scoring method that can output more appropriate repetitions during decoding.
arXiv Detail & Related papers (2022-07-03T01:21:49Z) - Taming Repetition in Dialogue Generation [1.851321027703742]
Inappropriate repetition of words can significantly degrade the quality of the generated texts.
We design a context-aware classifier to explicitly decide when to allow repetition and when to employ penalized sampling.
Our method can generate higher quality and more authentic dialogues.
arXiv Detail & Related papers (2021-12-16T06:25:46Z) - Using BERT Encoding and Sentence-Level Language Model for Sentence
Ordering [0.9134244356393667]
We propose an algorithm for sentence ordering in a corpus of short stories.
Our proposed method uses a language model based on Universal Transformers (UT) that captures sentences' dependencies by employing an attention mechanism.
The proposed model includes three components: Sentence, Language Model, and Sentence Arrangement with Brute Force Search.
arXiv Detail & Related papers (2021-08-24T23:03:36Z) - A Theoretical Analysis of the Repetition Problem in Text Generation [55.8184629429347]
We show that the repetition problem is, unfortunately, caused by the traits of our language itself.
One major reason is attributed to the fact that there exist too many words predicting the same word as the subsequent word with high probability.
We propose a novel rebalanced encoding approach to alleviate the high inflow problem.
arXiv Detail & Related papers (2020-12-29T08:51:47Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - Fact-aware Sentence Split and Rephrase with Permutation Invariant
Training [93.66323661321113]
Sentence Split and Rephrase aims to break down a complex sentence into several simple sentences with its meaning preserved.
Previous studies tend to address the issue by seq2seq learning from parallel sentence pairs.
We introduce Permutation Training to verifies the effects of order variance in seq2seq learning for this task.
arXiv Detail & Related papers (2020-01-16T07:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.