Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective
- URL: http://arxiv.org/abs/2310.10226v1
- Date: Mon, 16 Oct 2023 09:35:42 GMT
- Title: Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective
- Authors: Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier,
Taro Watanabe, Yixuan Su
- Abstract summary: This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
- Score: 91.14291142262262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are a number of diverging hypotheses about the neural text degeneration
problem, i.e., generating repetitive and dull loops, which makes this problem
both interesting and confusing. In this work, we aim to advance our
understanding by presenting a straightforward and fundamental explanation from
the data perspective. Our preliminary investigation reveals a strong
correlation between the degeneration issue and the presence of repetitions in
training data. Subsequent experiments also demonstrate that by selectively
dropping out the attention to repetitive words in training data, degeneration
can be significantly minimized. Furthermore, our empirical analysis illustrates
that prior works addressing the degeneration issue from various standpoints,
such as the high-inflow words, the likelihood objective, and the
self-reinforcement phenomenon, can be interpreted by one simple explanation.
That is, penalizing the repetitions in training data is a common and
fundamental factor for their effectiveness. Moreover, our experiments reveal
that penalizing the repetitions in training data remains critical even when
considering larger model sizes and instruction tuning.
Related papers
- CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement Learning [11.13226491866178]
We propose a novel approach to segment time series into meaningful subsequences and represent the time series based on these subsequences.
The subsequences are employed for causal inference to identify fundamental causal factors that significantly impact training outcomes.
Several experiments demonstrate the feasibility of our approach in common environments, confirming its ability to enhance the effectiveness of DRL training and impart a certain level of explainability to the training process.
arXiv Detail & Related papers (2024-05-14T07:23:10Z) - Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation [3.2340528215722553]
A systematic task formulation of continual neural information retrieval is presented.
A comprehensive continual neural information retrieval framework is proposed.
Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval.
arXiv Detail & Related papers (2023-08-16T14:01:25Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - Scaling Laws and Interpretability of Learning from Repeated Data [4.3242395495523525]
We train a family of models where most of the data is unique but a small fraction of it is repeated many times.
We find a strong double descent phenomenon, in which repeated data can lead test loss to increase midway through training.
A predictable range of repetition frequency leads to surprisingly severe degradation in performance.
arXiv Detail & Related papers (2022-05-21T02:14:27Z) - Executive Function: A Contrastive Value Policy for Resampling and
Relabeling Perceptions via Hindsight Summarization? [0.0]
We develop the few-shot continual learning task from first principles and hypothesize an evolutionary motivation and mechanism of action for executive function.
We show how this model of executive function can be used to implement hypothesis testing as a stream of consciousness and may explain observations of human few-shot learning and neuroanatomy.
arXiv Detail & Related papers (2022-04-27T00:07:44Z) - Extensive Studies of the Neutron Star Equation of State from the Deep
Learning Inference with the Observational Data Augmentation [0.0]
We discuss deep learning inference for the neutron star equation of state (EoS) using the real observational data of the mass and the radius.
For our deep learning method to incorporate uncertainties in observation, we augment the training data with noise fluctuations corresponding to observational uncertainties.
We conclude that the data augmentation could be a useful technique to evade the overfitting without tuning the neural network architecture.
arXiv Detail & Related papers (2021-01-20T14:27:12Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.