Induction Head Toxicity Mechanistically Explains Repetition Curse in Large Language Models
- URL: http://arxiv.org/abs/2505.13514v1
- Date: Sat, 17 May 2025 03:09:33 GMT
- Title: Induction Head Toxicity Mechanistically Explains Repetition Curse in Large Language Models
- Authors: Shuxun Wang, Qingyu Yin, Chak Tou Leong, Qiang Zhang, Linyi Yang,
- Abstract summary: We identify induction heads as a key driver of the repetition curse.<n>We propose a technique with attention head regularization that could be employed to reduce the dominance of induction heads during generation.
- Score: 24.666925550391024
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Repetition curse is a phenomenon where Large Language Models (LLMs) generate repetitive sequences of tokens or cyclic sequences. While the repetition curse has been widely observed, its underlying mechanisms remain poorly understood. In this work, we investigate the role of induction heads--a specific type of attention head known for their ability to perform in-context learning--in driving this repetitive behavior. Specifically, we focus on the "toxicity" of induction heads, which we define as their tendency to dominate the model's output logits during repetition, effectively excluding other attention heads from contributing to the generation process. Our findings have important implications for the design and training of LLMs. By identifying induction heads as a key driver of the repetition curse, we provide a mechanistic explanation for this phenomenon and suggest potential avenues for mitigation. We also propose a technique with attention head regularization that could be employed to reduce the dominance of induction heads during generation, thereby promoting more diverse and coherent outputs.
Related papers
- Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models [66.36240676392502]
Chain-of-thought (CoT) reasoning has become the standard paradigm for enabling Large Language Models (LLMs) to solve complex problems.<n>Recent studies reveal a sharp performance drop in reasoning hop generalization scenarios.<n>We propose test-time correction of reasoning, a lightweight intervention method that dynamically identifies and deactivates ep heads in the reasoning process.
arXiv Detail & Related papers (2026-01-29T03:24:32Z) - In-Context Learning Without Copying [31.718993147344353]
We study whether transformers can still acquire in-context learning capabilities when inductive copying is suppressed.<n>We propose Hapax, a setting where we omit the loss contribution of any token that can be correctly predicted by induction heads.<n>Mechanistic analysis shows that models trained with Hapax develop fewer and weaker induction heads but still preserve ICL capabilities.
arXiv Detail & Related papers (2025-11-07T22:11:11Z) - When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment [23.096167213579957]
Reasoning-Induced Misalignment (RIM) emerges when reasoning capabilities strengthen.<n>RIM is caused when specific types of reasoning patterns are introduced during inference or training.<n>During training, we find significantly higher activation entanglement between reasoning and safety in safety-critical neurons.
arXiv Detail & Related papers (2025-08-30T16:04:54Z) - Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models [1.0058542892457312]
We show that attention heads promoting factual output do so via general copy suppression rather than selective counterfactual suppression.<n>We show that attention head behavior is domain-dependent, with larger models exhibiting more specialized and category-sensitive patterns.
arXiv Detail & Related papers (2025-07-16T00:08:48Z) - Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers [76.42159902257677]
We argue that both behaviors stem from a single mechanism known as out-of-context reasoning (OCR)<n>OCR drives both generalization and hallucination, depending on whether the associated concepts are causally related.<n>Our work provides a theoretical foundation for understanding the OCR phenomenon, offering a new lens for analyzing and mitigating undesirable behaviors from knowledge injection.
arXiv Detail & Related papers (2025-06-12T16:50:45Z) - Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning [62.23671919314693]
Large language models (LLMs) have demonstrated significant improvements in contextual understanding.<n>However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace.<n>We introduce a two-stage framework called Learning to Focus (LeaF) to mitigate confounding factors.
arXiv Detail & Related papers (2025-06-09T15:16:39Z) - Interpreting the Repeated Token Phenomenon in Large Language Models [31.1226642501095]
Large Language Models (LLMs) often fail to accurately repeat a single word when prompted to, and instead output unrelated text.<n>We aim to explain the causes for this phenomenon and link it to the concept of attention sinks''<n>Our investigation identifies the neural circuit responsible for attention sinks and shows how long repetitions disrupt this circuit.
arXiv Detail & Related papers (2025-03-11T21:40:58Z) - Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs [77.66717051042032]
Practitioners have consistently observed three puzzling phenomena in transformer-based large language models.
These phenomena are characterized by certain so-called "sink tokens" receiving disproportionately high attention weights.
We elucidate the mechanisms behind extreme-token phenomena.
arXiv Detail & Related papers (2024-10-17T17:54:06Z) - Iteration Head: A Mechanistic Study of Chain-of-Thought [6.072247578478243]
Chain-of-Thought (CoT) reasoning is known to improve Large Language Models.
This paper shows how CoT reasoning emerges in transformers in a controlled and interpretable setting.
arXiv Detail & Related papers (2024-06-04T09:11:46Z) - Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z) - In-context Learning and Induction Heads [5.123049926855312]
"Induction heads" are attention heads that implement a simple algorithm to complete token sequences.
We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability.
arXiv Detail & Related papers (2022-09-24T00:43:19Z) - ACRE: Abstract Causal REasoning Beyond Covariation [90.99059920286484]
We introduce the Abstract Causal REasoning dataset for systematic evaluation of current vision systems in causal induction.
Motivated by the stream of research on causal discovery in Blicket experiments, we query a visual reasoning system with the following four types of questions in either an independent scenario or an interventional scenario.
We notice that pure neural models tend towards an associative strategy under their chance-level performance, whereas neuro-symbolic combinations struggle in backward-blocking reasoning.
arXiv Detail & Related papers (2021-03-26T02:42:38Z) - On-the-Fly Attention Modularization for Neural Generation [54.912042110885366]
We show that generated text is repetitive, generic, self-inconsistent, and lacking commonsense.
Our findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention during inference.
arXiv Detail & Related papers (2021-01-02T05:16:46Z) - Repulsive Attention: Rethinking Multi-head Attention as Bayesian
Inference [68.12511526813991]
We provide a novel understanding of multi-head attention from a Bayesian perspective.
We propose a non-parametric approach that explicitly improves the repulsiveness in multi-head attention.
Experiments on various attention models and applications demonstrate that the proposed repulsive attention can improve the learned feature diversity.
arXiv Detail & Related papers (2020-09-20T06:32:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.