Taming Knowledge Conflicts in Language Models
- URL: http://arxiv.org/abs/2503.10996v1
- Date: Fri, 14 Mar 2025 01:45:00 GMT
- Title: Taming Knowledge Conflicts in Language Models
- Authors: Gaotang Li, Yuzhong Chen, Hanghang Tong,
- Abstract summary: Language Models (LMs) often encounter knowledge conflicts when parametric memory contradicts contextual knowledge.<n>We term the "superposition of contextual information and parametric memory", where highly influential attention heads could simultaneously contribute to both memory and context.<n>We propose Just Run Twice (JUICE), a test-time attention intervention method that steers LMs toward either parametric beliefs or contextual knowledge without requiring fine-tuning.
- Score: 44.3653067423636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language Models (LMs) often encounter knowledge conflicts when parametric memory contradicts contextual knowledge. Previous works attribute this conflict to the interplay between "memory heads" and "context heads", attention heads assumed to promote either memory or context exclusively. In this study, we go beyond this fundamental assumption by uncovering a critical phenomenon we term the "superposition of contextual information and parametric memory", where highly influential attention heads could simultaneously contribute to both memory and context. Building upon this insight, we propose Just Run Twice (JUICE), a test-time attention intervention method that steers LMs toward either parametric beliefs or contextual knowledge without requiring fine-tuning. JUICE identifies a set of reliable attention heads and leverages a dual-run approach to mitigate the superposition effects. Extensive experiments across 11 datasets and 6 model architectures demonstrate that JUICE sets the new state-of-the-art performance and robust generalization, achieving significant and consistent improvement across different domains under various conflict types. Finally, we theoretically analyze knowledge conflict and the superposition of contextual information and parametric memory in attention heads, which further elucidates the effectiveness of JUICE in these settings.
Related papers
- Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs [55.74117540987519]
This paper explores the problem of commonsense-level vision-knowledge conflict in Multimodal Large Language Models (MLLMs)
We introduce an automated pipeline, augmented with human-in-the-loop quality control, to establish a benchmark aimed at simulating and assessing the conflicts in MLLMs.
We evaluate the conflict-resolution capabilities of nine representative MLLMs across various model families and find a noticeable over-reliance on textual queries.
arXiv Detail & Related papers (2024-10-10T17:31:17Z) - Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models [33.76903352835436]
Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities for capturing and reasoning over multimodal inputs.
These models are prone to parametric knowledge conflicts, which arise from inconsistencies of represented knowledge between their vision and language components.
We present a systematic approach to detect, interpret, and mitigate them.
arXiv Detail & Related papers (2024-10-04T17:59:28Z) - DYNAMICQA: Tracing Internal Knowledge Conflicts in Language Models [42.776896363518844]
We study the effect of intra-memory conflict on an LM's ability to accept relevant context.
We utilize two knowledge conflict measures and a novel dataset containing inherently conflicting data, DynamicQA.
We verify that LMs exhibit a greater degree of intra-memory conflict with dynamic facts compared to facts that have a single truth value.
arXiv Detail & Related papers (2024-07-24T06:06:07Z) - Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and
Mitigating Knowledge Conflicts in Language Models [18.2500350157507]
Internal memory and external context inevitably clash, leading to knowledge conflicts within language models (LMs)
We propose a novel method called PatH PatcHing (PH3), which can efficiently mitigate knowledge conflicts by pruning conflicting attention heads without updating model parameters.
arXiv Detail & Related papers (2024-02-28T08:34:41Z) - Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection.
We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - A Framework for Inference Inspired by Human Memory Mechanisms [9.408704431898279]
We propose a PMI framework that consists of perception, memory and inference components.
The memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain extensive and complex relational knowledge and experience.
We apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets.
arXiv Detail & Related papers (2023-10-01T08:12:55Z) - Re-mine, Learn and Reason: Exploring the Cross-modal Semantic
Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task.
We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z) - Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning [87.92209048521153]
Event temporal reasoning aims at identifying the temporal relations between two or more events from narratives.
Knowledge conflicts arise when there is a mismatch between the actual temporal relations of events in the context and the prior knowledge or biases learned by the model.
arXiv Detail & Related papers (2023-05-24T10:04:06Z) - Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts.
Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks.
We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z) - Entity-Based Knowledge Conflicts in Question Answering [29.973926661540524]
We formalize the problem of knowledge conflicts, where the contextual information contradicts the learned information.
We propose a method to mitigate over-reliance on parametric knowledge, which minimizes hallucination, and improves out-of-distribution generalization by 4%-7%.
Our findings demonstrate the importance for practitioners to evaluate model tendency to hallucinate rather than read, and show that our mitigation strategy encourages generalization to evolving information.
arXiv Detail & Related papers (2021-09-10T18:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.