Where Knowledge Collides: A Mechanistic Study of Intra-Memory Knowledge Conflict in Language Models
- URL: http://arxiv.org/abs/2601.09445v1
- Date: Wed, 14 Jan 2026 12:45:52 GMT
- Title: Where Knowledge Collides: A Mechanistic Study of Intra-Memory Knowledge Conflict in Language Models
- Authors: Minh Vu Pham, Hsuvas Borkakoty, Yufang Hou,
- Abstract summary: In language models (LMs) intra-memory knowledge conflict largely arises when inconsistent information about the same event is encoded within the model's parametric knowledge.<n>We use mechanistic interpretability methods to identify where and how conflicting knowledge from pre-training data is encoded within LMs.<n>Our findings contribute to a growing body of evidence that specific internal components of a language model are responsible for encoding conflicting knowledge from pre-training.
- Score: 8.965740058804197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In language models (LMs), intra-memory knowledge conflict largely arises when inconsistent information about the same event is encoded within the model's parametric knowledge. While prior work has primarily focused on resolving conflicts between a model's internal knowledge and external resources through approaches such as fine-tuning or knowledge editing, the problem of localizing conflicts that originate during pre-training within the model's internal representations remain unexplored. In this work, we design a framework based on mechanistic interpretability methods to identify where and how conflicting knowledge from the pre-training data is encoded within LMs. Our findings contribute to a growing body of evidence that specific internal components of a language model are responsible for encoding conflicting knowledge from pre-training, and we demonstrate how mechanistic interpretability methods can be leveraged to causally intervene in and control conflicting knowledge at inference time.
Related papers
- CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering [53.7094431951084]
Knowledge-based visual question answering (KB-VQA) demonstrates significant potential for handling knowledge-intensive tasks.<n>Conflicts arise between static parametric knowledge in vision language models and dynamically retrieved information.<n>We propose textbfCC-VQA as a training-free, conflict- and correlation-aware method for KB-VQA.
arXiv Detail & Related papers (2026-02-27T11:56:26Z) - Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models [80.21037538996553]
We propose a novel meta-cognitive framework for reliable knowledge augmentation via differentiated intervention and alignment.<n>Our approach leverages internal cognitive signals to partition the knowledge space into mastered, confused, and missing regions, guiding targeted knowledge expansion.<n>Our framework consistently outperforms strong baselines, validating its rationality in not only enhancing knowledge capabilities but also fostering cognitive behaviors that better distinguish knowns from unknowns.
arXiv Detail & Related papers (2026-02-13T15:07:35Z) - Auditing Language Model Unlearning via Information Decomposition [68.48660428111593]
We introduce an interpretable, information-theoretic framework for auditing unlearning using Partial Information Decomposition (PID)<n>By comparing model representations before and after unlearning, we decompose the mutual information with the forgotten data into distinct components, formalizing the notions of unlearned and residual knowledge.<n>Our work introduces a principled, representation-level audit for unlearning, offering theoretical insight and actionable tools for safer deployment of language models.
arXiv Detail & Related papers (2026-01-21T15:51:19Z) - That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation [55.78914774437411]
Large language models (LLMs) behave when faced with discrepancies between their parametric knowledge and conflicting information contained in a prompt.<n>We propose a domain-agnostic framework for constructing and interpreting such conflicts.<n>We show that activation-level steering can achieve up to a textbf12.6% improvement in steering success over a random baseline.
arXiv Detail & Related papers (2025-10-21T22:27:56Z) - When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models [13.390492503308792]
We analyze the mechanisms that Vision-language models (VLMs) use to resolve cross-modal conflicts.<n>We localize with logit inspection a small set of heads that control the conflict.<n>We show that attention from such heads pinpoints localized image regions driving visual overrides, outperforming gradient-based attribution in precision.
arXiv Detail & Related papers (2025-07-18T12:42:30Z) - Conflicts in Texts: Data, Implications and Challenges [58.03478157713084]
Conflicts could reflect the complexity of situations, changes that need to be explained and dealt with, difficulties in data annotation, and mistakes in generated outputs.<n>This survey categorizes these conflicts into three key areas: (1) natural texts on the web, where factual inconsistencies, subjective biases, and multiple perspectives introduce contradictions; (2) human-annotated data, where annotator disagreements, mistakes, and societal biases impact model training; and (3) model interactions, where hallucinations and knowledge conflicts emerge during deployment.<n>We highlight key challenges and future directions for developing conflict-aware NLP systems that can reason over and reconcile conflicting information more effectively
arXiv Detail & Related papers (2025-04-28T04:24:01Z) - Mitigating Knowledge Conflicts in Language Model-Driven Question Answering [15.29366851382021]
Two fundamental knowledge sources play crucial roles in document-based question answering and document summarization systems.<n>Recent studies revealed a significant challenge: when there exists a misalignment between the model's inherent knowledge and the ground truth answers in training data, the system may exhibit problematic behaviors during inference.<n>Our investigation proposes a strategy to minimize hallucination by building explicit connection between source inputs and generated outputs.
arXiv Detail & Related papers (2024-11-18T07:33:10Z) - Analysing the Residual Stream of Language Models Under Knowledge Conflicts [23.96385393039587]
Large language models (LLMs) can store a significant amount of factual knowledge in their parameters.<n>However, their parametric knowledge may conflict with the information provided in the context.<n>This can lead to undesirable model behaviour, such as reliance on outdated or incorrect information.
arXiv Detail & Related papers (2024-10-21T15:12:51Z) - DYNAMICQA: Tracing Internal Knowledge Conflicts in Language Models [42.776896363518844]
We study the effect of intra-memory conflict on an LM's ability to accept relevant context.
We utilize two knowledge conflict measures and a novel dataset containing inherently conflicting data, DynamicQA.
We verify that LMs exhibit a greater degree of intra-memory conflict with dynamic facts compared to facts that have a single truth value.
arXiv Detail & Related papers (2024-07-24T06:06:07Z) - LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.<n>If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.<n>To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z) - Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and
Mitigating Knowledge Conflicts in Language Models [18.2500350157507]
Internal memory and external context inevitably clash, leading to knowledge conflicts within language models (LMs)
We propose a novel method called PatH PatcHing (PH3), which can efficiently mitigate knowledge conflicts by pruning conflicting attention heads without updating model parameters.
arXiv Detail & Related papers (2024-02-28T08:34:41Z) - Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy Constraint [20.543282448771336]
We propose an adaptive decoding method to discern whether the knowledge conflicts occur and resolve them.
Experiments show that COIECD exhibits strong performance and robustness over knowledge conflicts in realistic datasets.
arXiv Detail & Related papers (2024-02-19T07:10:30Z) - Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning [87.92209048521153]
Event temporal reasoning aims at identifying the temporal relations between two or more events from narratives.
Knowledge conflicts arise when there is a mismatch between the actual temporal relations of events in the context and the prior knowledge or biases learned by the model.
arXiv Detail & Related papers (2023-05-24T10:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.