A Theoretical Analysis of the Repetition Problem in Text Generation
- URL: http://arxiv.org/abs/2012.14660v4
- Date: Mon, 22 Mar 2021 02:55:21 GMT
- Title: A Theoretical Analysis of the Repetition Problem in Text Generation
- Authors: Zihao Fu, Wai Lam, Anthony Man-Cho So, Bei Shi
- Abstract summary: We show that the repetition problem is, unfortunately, caused by the traits of our language itself.
One major reason is attributed to the fact that there exist too many words predicting the same word as the subsequent word with high probability.
We propose a novel rebalanced encoding approach to alleviate the high inflow problem.
- Score: 55.8184629429347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text generation tasks, including translation, summarization, language models,
and etc. see rapid growth during recent years. Despite the remarkable
achievements, the repetition problem has been observed in nearly all text
generation models undermining the generation performance extensively. To solve
the repetition problem, many methods have been proposed, but there is no
existing theoretical analysis to show why this problem happens and how it is
resolved. In this paper, we propose a new framework for theoretical analysis
for the repetition problem. We first define the Average Repetition Probability
(ARP) to characterize the repetition problem quantitatively. Then, we conduct
an extensive analysis of the Markov generation model and derive several upper
bounds of the average repetition probability with intuitive understanding. We
show that most of the existing methods are essentially minimizing the upper
bounds explicitly or implicitly. Grounded on our theory, we show that the
repetition problem is, unfortunately, caused by the traits of our language
itself. One major reason is attributed to the fact that there exist too many
words predicting the same word as the subsequent word with high probability.
Consequently, it is easy to go back to that word and form repetitions and we
dub it as the high inflow problem. Furthermore, we derive a concentration bound
of the average repetition probability for a general generation model. Finally,
based on the theoretical upper bounds, we propose a novel rebalanced encoding
approach to alleviate the high inflow problem. The experimental results show
that our theoretical framework is applicable in general generation models and
our proposed rebalanced encoding approach alleviates the repetition problem
significantly. The source code of this paper can be obtained from
https://github.com/fuzihaofzh/repetition-problem-nlg.
Related papers
- Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z) - The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language
Models [45.01562498702836]
Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps.
We propose SOCRATIC QUESTIONING, a divide-and-conquer style algorithm that mimics the recursive thinking process.
arXiv Detail & Related papers (2023-05-24T10:36:14Z) - Learning to Break the Loop: Analyzing and Mitigating Repetitions for
Neural Text Generation [41.3948101212288]
We study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context.
We propose a training method where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data.
arXiv Detail & Related papers (2022-06-06T05:51:12Z) - End-to-end Algorithm Synthesis with Recurrent Networks: Logical
Extrapolation Without Overthinking [52.05847268235338]
We show how machine learning systems can perform logical extrapolation without overthinking problems.
We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten.
We also employ a progressive training routine that prevents the model from learning behaviors that are specific to number and instead pushes it to learn behaviors that can be repeated indefinitely.
arXiv Detail & Related papers (2022-02-11T18:43:28Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.