Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models
- URL: http://arxiv.org/abs/2507.11809v1
- Date: Wed, 16 Jul 2025 00:08:48 GMT
- Title: Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models
- Authors: Dante Campregher, Yanxu Chen, Sander Hoffman, Maria Heuss,
- Abstract summary: We show that attention heads promoting factual output do so via general copy suppression rather than selective counterfactual suppression.<n>We show that attention head behavior is domain-dependent, with larger models exhibiting more specialized and category-sensitive patterns.
- Score: 1.0058542892457312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a reproducibility study examining how Large Language Models (LLMs) manage competing factual and counterfactual information, focusing on the role of attention heads in this process. We attempt to reproduce and reconcile findings from three recent studies by Ortu et al., Yu, Merullo, and Pavlick and McDougall et al. that investigate the competition between model-learned facts and contradictory context information through Mechanistic Interpretability tools. Our study specifically examines the relationship between attention head strength and factual output ratios, evaluates competing hypotheses about attention heads' suppression mechanisms, and investigates the domain specificity of these attention patterns. Our findings suggest that attention heads promoting factual output do so via general copy suppression rather than selective counterfactual suppression, as strengthening them can also inhibit correct facts. Additionally, we show that attention head behavior is domain-dependent, with larger models exhibiting more specialized and category-sensitive patterns.
Related papers
- Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models [66.36240676392502]
Chain-of-thought (CoT) reasoning has become the standard paradigm for enabling Large Language Models (LLMs) to solve complex problems.<n>Recent studies reveal a sharp performance drop in reasoning hop generalization scenarios.<n>We propose test-time correction of reasoning, a lightweight intervention method that dynamically identifies and deactivates ep heads in the reasoning process.
arXiv Detail & Related papers (2026-01-29T03:24:32Z) - Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures [72.27391760972445]
Large Reasoning Models (LRMs) have pushed reasoning capabilities to new heights.<n>This paper organizes recent findings into three core dimensions: 1) training dynamics, 2) reasoning mechanisms, and 3) unintended behaviors.
arXiv Detail & Related papers (2026-01-11T08:48:46Z) - Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation [43.974424280422085]
We investigate mechanisms within the thinking process behind social bias aggregation.<n>We uncover two failure patterns that drive social bias aggregation.<n>Our approach effectively reduces bias while maintaining or improving accuracy.
arXiv Detail & Related papers (2025-10-20T00:33:44Z) - On the Generalizability of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals" [0.8621608193534839]
We reproduce a study of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals"<n>It investigates mechanisms in language models between factual recall and counterfactual in-context repetition.<n>We find that the attention head ablation proposed in Ortu et al. (2024) is ineffective for domains that are underrepresented in their dataset.
arXiv Detail & Related papers (2025-06-28T18:29:19Z) - Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning [47.764552063499046]
Large language models (LLMs) have demonstrated significant improvements in contextual understanding.<n>However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace.<n>We introduce a two-stage framework called Learning to Focus (LeaF) to mitigate confounding factors.
arXiv Detail & Related papers (2025-06-09T15:16:39Z) - A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [53.18562650350898]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z) - Systematic Outliers in Large Language Models [41.2150163753952]
Outliers have been widely observed in Large Language Models (LLMs)<n>We provide a detailed analysis of the formation process, underlying causes, and functions of outliers in LLMs.
arXiv Detail & Related papers (2025-02-10T12:54:17Z) - The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine Learning [78.13481522957552]
Machine learning models are sensitive to spurious correlations between non-essential features of the inputs and the corresponding labels.<n>This paper provides a comprehensive survey of this emerging issue, along with a fine-grained taxonomy of existing state-of-the-art methods for addressing spurious correlations in machine learning models.
arXiv Detail & Related papers (2024-02-20T04:49:34Z) - Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals [82.68757839524677]
Interpretability research aims to bridge the gap between empirical success and our scientific understanding of large language models (LLMs)
We propose a formulation of competition of mechanisms, which focuses on the interplay of multiple mechanisms instead of individual mechanisms.
Our findings show traces of the mechanisms and their competition across various model components and reveal attention positions that effectively control the strength of certain mechanisms.
arXiv Detail & Related papers (2024-02-18T17:26:51Z) - Inducing Causal Structure for Abstractive Text Summarization [76.1000380429553]
We introduce a Structural Causal Model (SCM) to induce the underlying causal structure of the summarization data.
We propose a Causality Inspired Sequence-to-Sequence model (CI-Seq2Seq) to learn the causal representations that can mimic the causal factors.
Experimental results on two widely used text summarization datasets demonstrate the advantages of our approach.
arXiv Detail & Related papers (2023-08-24T16:06:36Z) - Context De-confounded Emotion Recognition [12.037240778629346]
Context-Aware Emotion Recognition (CAER) aims to perceive the emotional states of the target person with contextual information.
A long-overlooked issue is that a context bias in existing datasets leads to a significantly unbalanced distribution of emotional states.
This paper provides a causality-based perspective to disentangle the models from the impact of such bias, and formulate the causalities among variables in the CAER task.
arXiv Detail & Related papers (2023-03-21T15:12:20Z) - Causal Triplet: An Open Challenge for Intervention-centric Causal
Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes.
We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z) - On Causally Disentangled Representations [18.122893077772993]
We present an analysis of disentangled representations through the notion of disentangled causal process.
We show that our metrics capture the desiderata of disentangled causal process.
We perform an empirical study on state of the art disentangled representation learners using our metrics and dataset to evaluate them from causal perspective.
arXiv Detail & Related papers (2021-12-10T18:56:27Z) - Towards Causal Representation Learning [96.110881654479]
The two fields of machine learning and graphical causality arose and developed separately.
There is now cross-pollination and increasing interest in both fields to benefit from the advances of the other.
arXiv Detail & Related papers (2021-02-22T15:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.