Related papers: A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation

URL: http://arxiv.org/abs/2012.14660v4
Date: Mon, 22 Mar 2021 02:55:21 GMT
Title: A Theoretical Analysis of the Repetition Problem in Text Generation
Authors: Zihao Fu, Wai Lam, Anthony Man-Cho So, Bei Shi
Abstract summary: We show that the repetition problem is, unfortunately, caused by the traits of our language itself. One major reason is attributed to the fact that there exist too many words predicting the same word as the subsequent word with high probability. We propose a novel rebalanced encoding approach to alleviate the high inflow problem.
Score: 55.8184629429347
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text generation tasks, including translation, summarization, language models, and etc. see rapid growth during recent years. Despite the remarkable achievements, the repetition problem has been observed in nearly all text generation models undermining the generation performance extensively. To solve the repetition problem, many methods have been proposed, but there is no existing theoretical analysis to show why this problem happens and how it is resolved. In this paper, we propose a new framework for theoretical analysis for the repetition problem. We first define the Average Repetition Probability (ARP) to characterize the repetition problem quantitatively. Then, we conduct an extensive analysis of the Markov generation model and derive several upper bounds of the average repetition probability with intuitive understanding. We show that most of the existing methods are essentially minimizing the upper bounds explicitly or implicitly. Grounded on our theory, we show that the repetition problem is, unfortunately, caused by the traits of our language itself. One major reason is attributed to the fact that there exist too many words predicting the same word as the subsequent word with high probability. Consequently, it is easy to go back to that word and form repetitions and we dub it as the high inflow problem. Furthermore, we derive a concentration bound of the average repetition probability for a general generation model. Finally, based on the theoretical upper bounds, we propose a novel rebalanced encoding approach to alleviate the high inflow problem. The experimental results show that our theoretical framework is applicable in general generation models and our proposed rebalanced encoding approach alleviates the repetition problem significantly. The source code of this paper can be obtained from https://github.com/fuzihaofzh/repetition-problem-nlg.

Related papers

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation [2.4555276449137042]
We propose a family of three new decoding methods by leveraging a mathematical analysis of the token probability distribution. Our approach consistently performs at least as well as current alternatives in terms of quality and diversity.
arXiv Detail & Related papers (2025-02-19T19:00:02Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data. Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables. We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z)
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z)
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models [45.01562498702836]
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps. We propose SOCRATIC QUESTIONING, a divide-and-conquer style algorithm that mimics the recursive thinking process.
arXiv Detail & Related papers (2023-05-24T10:36:14Z)
Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation [41.3948101212288]
We study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. We propose a training method where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data.
arXiv Detail & Related papers (2022-06-06T05:51:12Z)
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking [52.05847268235338]
We show how machine learning systems can perform logical extrapolation without overthinking problems. We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to number and instead pushes it to learn behaviors that can be repeated indefinitely.
arXiv Detail & Related papers (2022-02-11T18:43:28Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.