Context Copying Modulation: The Role of Entropy Neurons in Managing Parametric and Contextual Knowledge Conflicts
- URL: http://arxiv.org/abs/2509.10663v2
- Date: Wed, 17 Sep 2025 11:21:33 GMT
- Title: Context Copying Modulation: The Role of Entropy Neurons in Managing Parametric and Contextual Knowledge Conflicts
- Authors: Zineddine Tighidet, Andrea Mogini, Hedi Ben-younes, Jiali Mei, Patrick Gallinari, Benjamin Piwowarski,
- Abstract summary: We show that entropy neurons are responsible for suppressing context copying across a range of Large Language Models.<n>These results enhance our understanding of the internal dynamics of LLMs when handling conflicting information.
- Score: 16.645800301676996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The behavior of Large Language Models (LLMs) when facing contextual information that conflicts with their internal parametric knowledge is inconsistent, with no generally accepted explanation for the expected outcome distribution. Recent work has identified in autoregressive transformer models a class of neurons -- called entropy neurons -- that produce a significant effect on the model output entropy while having an overall moderate impact on the ranking of the predicted tokens. In this paper, we investigate the preliminary claim that these neurons are involved in inhibiting context copying behavior in transformers by looking at their role in resolving conflicts between contextual and parametric information. We show that entropy neurons are responsible for suppressing context copying across a range of LLMs, and that ablating them leads to a significant change in the generation process. These results enhance our understanding of the internal dynamics of LLMs when handling conflicting information.
Related papers
- When Does Context Help? Error Dynamics of Contextual Information in Large Language Models [64.88201012057822]
We present a unified theoretical framework for analyzing the effect of arbitrary contextual information in large language models.<n>Our analysis characterizes contextual influence through output error dynamics.<n> Experiments across ICL, retrieval-augmented generation, and memory evolution validate our theory and motivate a principled context selection strategy.
arXiv Detail & Related papers (2026-02-09T05:58:41Z) - Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis [3.1526281887627587]
Distinguishing recall from reasoning is crucial for predicting model generalization.<n>We use controlled datasets of synthetic linguistic puzzles to probe transformer models at the layer, head, and neuron level.<n>Our results provide the first causal evidence that recall and reasoning rely on separable but interacting circuits in transformer models.
arXiv Detail & Related papers (2025-10-03T04:13:06Z) - Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models [55.46269953415811]
We identify ToM-sensitive parameters and show that perturbing as little as 0.001% of these parameters significantly degrades ToM performance.<n>Our results have implications for enhancing model alignment, mitigating biases, and improving AI systems designed for human interaction.
arXiv Detail & Related papers (2025-04-05T17:45:42Z) - Decomposing Interventional Causality into Synergistic, Redundant, and Unique Components [0.0]
We introduce a novel framework for decomposing interventional causal effects into synergistic, redundant, and unique components.<n>We develop a mathematical approach that systematically quantifies how causal power is distributed among variables in a system.
arXiv Detail & Related papers (2025-01-20T12:34:51Z) - Rethinking Associative Memory Mechanism in Induction Head [37.93644115914534]
This paper investigates how a two-layer transformer thoroughly captures in-context information and balances it with pretrained bigram knowledge in next token prediction.<n>We theoretically analyze the representation of weight matrices in attention layers and the resulting logits when a transformer is given prompts generated by a bigram model.
arXiv Detail & Related papers (2024-12-16T05:33:05Z) - Generative Intervention Models for Causal Perturbation Modeling [80.72074987374141]
In many applications, it is a priori unknown which mechanisms of a system are modified by an external perturbation.<n>We propose a generative intervention model (GIM) that learns to map these perturbation features to distributions over atomic interventions.
arXiv Detail & Related papers (2024-11-21T10:37:57Z) - Modularity in Transformers: Investigating Neuron Separability & Specialization [0.0]
Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited.
This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models.
Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets.
arXiv Detail & Related papers (2024-08-30T14:35:01Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z) - Interpreting Neural Policies with Disentangled Tree Representations [58.769048492254555]
We study interpretability of compact neural policies through the lens of disentangled representation.
We leverage decision trees to obtain factors of variation for disentanglement in robot learning.
We introduce interpretability metrics that measure disentanglement of learned neural dynamics.
arXiv Detail & Related papers (2022-10-13T01:10:41Z) - Recoding latent sentence representations -- Dynamic gradient-based
activation modification in RNNs [0.0]
In RNNs, encoding information in a suboptimal way can impact the quality of representations based on later elements in the sequence.
I propose an augmentation to standard RNNs in form of a gradient-based correction mechanism.
I conduct different experiments in the context of language modeling, where the impact of using such a mechanism is examined in detail.
arXiv Detail & Related papers (2021-01-03T17:54:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.