Repetitions are not all alike: distinct mechanisms sustain repetition in language models
- URL: http://arxiv.org/abs/2504.01100v2
- Date: Tue, 04 Nov 2025 16:26:26 GMT
- Title: Repetitions are not all alike: distinct mechanisms sustain repetition in language models
- Authors: Matéo Mahaut, Francesca Franzon,
- Abstract summary: We investigate whether behaviorally similar repetition patterns arise from distinct underlying mechanisms and how these mechanisms develop during model training.<n>Our analyses reveal that ICL-context repetition relies on a dedicated network of attention heads that progressively specialize over training, whereas naturally occurring repetition emerges early and lacks a defined circuitry.
- Score: 1.6271572050069254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) can sometimes degrade into repetitive loops, persistently generating identical word sequences. Because repetition is rare in natural human language, its frequent occurrence across diverse tasks and contexts in LLMs remains puzzling. Here we investigate whether behaviorally similar repetition patterns arise from distinct underlying mechanisms and how these mechanisms develop during model training. We contrast two conditions: repetitions elicited by natural text prompts with those induced by in-context learning (ICL) setups that explicitly require copying behavior. Our analyses reveal that ICL-induced repetition relies on a dedicated network of attention heads that progressively specialize over training, whereas naturally occurring repetition emerges early and lacks a defined circuitry. Attention inspection further shows that natural repetition focuses disproportionately on low-information tokens, suggesting a fallback behavior when relevant context cannot be retrieved. These results indicate that superficially similar repetition behaviors originate from qualitatively different internal processes, reflecting distinct modes of failure and adaptation in language models.
Related papers
- Circular Reasoning: Understanding Self-Reinforcing Loops in Large Reasoning Models [66.11277323593475]
Circular Reasoning is a self-reinforcing trap where generated content acts as a logical premise for its own recurrence.<n>Mechanistically, we characterize circular reasoning as a state collapse exhibiting distinct boundaries.<n>We reveal that reasoning impasses trigger the loop onset, which subsequently persists as an inescapable cycle driven by a self-reinforcing V-shaped attention mechanism.
arXiv Detail & Related papers (2026-01-09T10:23:55Z) - Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs [36.89422086121058]
We show that errors result from a structured yet flawed mechanism that we term class-based (mis)generalization.<n>Experiments on Llama-3, Mistral, and Pythia reveal that this behavior is reflected in the model's internal computations.
arXiv Detail & Related papers (2025-05-28T17:47:52Z) - In-Context Learning can distort the relationship between sequence likelihoods and biological fitness [0.0]
We show that in-context learning can distort the relationship between fitness and likelihood scores of sequences.<n>This phenomenon manifests as anomalously high likelihood scores for sequences that contain repeated motifs.
arXiv Detail & Related papers (2025-04-23T19:30:01Z) - Understanding the Repeat Curse in Large Language Models from a Feature Perspective [10.413608338398785]
Large language models (LLMs) often suffer from repetitive text generation.
We propose a novel approach, "Duplicatus Charm", to induce and analyze the Repeat Curse.
arXiv Detail & Related papers (2025-04-19T07:53:37Z) - Deterministic or probabilistic? The psychology of LLMs as random number generators [0.0]
Large Language Models (LLMs) have transformed text generation through inherently probabilistic context-aware mechanisms.<n>Our results reveal that, despite their transformers-based architecture, these models often exhibit deterministic responses when prompted for random numerical outputs.
arXiv Detail & Related papers (2025-02-27T10:45:27Z) - Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing [28.646627695015646]
Repetitive transformations can lead to stable configurations, known as attractors, including fixed points and limit cycles.<n>Applying this perspective to large language models (LLMs), which iteratively map input text to output text, provides a principled approach to characterizing long-term behaviors.<n>Successive paraphrasing serves as a compelling testbed for exploring such dynamics, as paraphrases re-express the same underlying meaning with linguistic variation.
arXiv Detail & Related papers (2025-02-21T04:46:57Z) - Repetition Neurons: How Do Language Models Produce Repetitions? [25.430820735194768]
This paper introduces repetition neurons, regarded as skill neurons responsible for the repetition problem in text generation tasks.<n>We identify these repetition neurons by comparing activation values before and after the onset of repetition in texts generated by recent pre-trained language models.
arXiv Detail & Related papers (2024-10-17T12:43:47Z) - Nested replicator dynamics, nested logit choice, and similarity-based learning [56.98352103321524]
We consider a model of learning and evolution in games with action sets endowed with a partition-based similarity structure.
In this model, revising agents have a higher probability of comparing their current strategy with other strategies that they deem similar.
Because of this implicit bias toward similar strategies, the resulting dynamics do not satisfy any of the standard monotonicity rationalitys for imitative game dynamics.
arXiv Detail & Related papers (2024-07-25T07:09:53Z) - From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty [67.81977289444677]
Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions.<n>We categorize fallback behaviors - sequence repetitions, degenerate text, and hallucinations - and extensively analyze them.<n>Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes.
arXiv Detail & Related papers (2024-07-08T16:13:42Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z) - Mitigating the Learning Bias towards Repetition by Self-Contrastive
Training for Open-Ended Generation [92.42032403795879]
We show that pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts.
We attribute their overestimation of token-level repetition probabilities to the learning bias.
We find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
arXiv Detail & Related papers (2023-07-04T07:53:55Z) - Replicable Reinforcement Learning [15.857503103543308]
We provide a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting.
These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.
arXiv Detail & Related papers (2023-05-24T16:05:15Z) - Identifiability Results for Multimodal Contrastive Learning [72.15237484019174]
We show that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously.
Our work provides a theoretical basis for multimodal representation learning and explains in which settings multimodal contrastive learning can be effective in practice.
arXiv Detail & Related papers (2023-03-16T09:14:26Z) - Composed Variational Natural Language Generation for Few-shot Intents [118.37774762596123]
We generate training examples for few-shot intents in the realistic imbalanced scenario.
To evaluate the quality of the generated utterances, experiments are conducted on the generalized few-shot intent detection task.
Our proposed model achieves state-of-the-art performances on two real-world intent detection datasets.
arXiv Detail & Related papers (2020-09-21T17:48:43Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.