Related papers: Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Analysing the Residual Stream of Language Models Under Knowledge Conflicts

URL: http://arxiv.org/abs/2410.16090v1
Date: Mon, 21 Oct 2024 15:12:51 GMT
Title: Analysing the Residual Stream of Language Models Under Knowledge Conflicts
Authors: Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini,
Abstract summary: Large language models (LLMs) can store a significant amount of factual knowledge in their parameters. However, their parametric knowledge may conflict with the information provided in the context. This can lead to undesirable model behaviour, such as reliance on outdated or incorrect information.
Score: 23.96385393039587
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can store a significant amount of factual knowledge in their parameters. However, their parametric knowledge may conflict with the information provided in the context. Such conflicts can lead to undesirable model behaviour, such as reliance on outdated or incorrect information. In this work, we investigate whether LLMs can identify knowledge conflicts and whether it is possible to know which source of knowledge the model will rely on by analysing the residual stream of the LLM. Through probing tasks, we find that LLMs can internally register the signal of knowledge conflict in the residual stream, which can be accurately detected by probing the intermediate model activations. This allows us to detect conflicts within the residual stream before generating the answers without modifying the input or model parameters. Moreover, we find that the residual stream shows significantly different patterns when the model relies on contextual knowledge versus parametric knowledge to resolve conflicts. This pattern can be employed to estimate the behaviour of LLMs when conflict happens and prevent unexpected answers before producing the answers. Our analysis offers insights into how LLMs internally manage knowledge conflicts and provides a foundation for developing methods to control the knowledge selection processes.

Related papers

FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation [37.28571879699906]
Large language models (LLMs) augmented with retrieval systems have demonstrated significant potential in handling knowledge-intensive tasks.<n>This paper proposes FaithfulRAG, a novel framework that resolves knowledge conflicts by explicitly modeling discrepancies between the models parametric knowledge and retrieved context.
arXiv Detail & Related papers (2025-06-10T16:02:54Z)
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models [16.41477610681199]
Large language models frequently rely on both contextual input and parametric knowledge to perform tasks.<n>These sources can come into conflict, especially when retrieved documents contradict the model's parametric beliefs.<n>We propose a diagnostic framework to systematically evaluate LLM behavior under context-memory conflict.
arXiv Detail & Related papers (2025-06-06T19:20:23Z)
Accommodate Knowledge Conflicts in Retrieval-augmented LLMs: Towards Reliable Response Generation in the Wild [11.058848731627233]
Large language models (LLMs) have advanced information retrieval systems. LLMs often face knowledge conflicts between internal memory and retrievaled external information. We propose Swin-VIB, a novel framework that integrates a pipeline of variational information bottleneck models into adaptive augmentation of retrieved information.
arXiv Detail & Related papers (2025-04-17T14:40:31Z)
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering [23.96385393039587]
Large language models (LLMs) can store a significant amount of factual knowledge in their parameters. LLMs can internally register the signals of knowledge conflict at mid-layers. We propose textscSpARE, a representation engineering method that uses pre-trained sparse auto-encoders.
arXiv Detail & Related papers (2024-10-21T13:30:47Z)
Probing Language Models on Their Knowledge Source [19.779433870719945]
Large Language Models (LLMs) often encounter conflicts between their learned, internal (parametric knowledge, PK) and external knowledge provided during inference (contextual knowledge, CK)
arXiv Detail & Related papers (2024-10-08T08:47:11Z)
ECon: On the Detection and Resolution of Evidence Conflicts [56.89209046429291]
The rise of large language models (LLMs) has significantly influenced the quality of information in decision-making systems. This study introduces a method for generating diverse, validated evidence conflicts to simulate real-world misinformation scenarios.
arXiv Detail & Related papers (2024-10-05T07:41:17Z)
Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established. This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt. We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z)
LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities. If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information. To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z)
Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models [51.72963030032491]
Knowledge documents for large language models (LLMs) may conflict with the memory of LLMs due to outdated or incorrect knowledge. We construct a new dataset, dubbed KNOT, for knowledge conflict resolution examination in the form of question answering.
arXiv Detail & Related papers (2024-04-04T16:40:11Z)
Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z)
Resolving Knowledge Conflicts in Large Language Models [46.903549751371415]
Large language models (LLMs) often encounter knowledge conflicts. We ask what are the desiderata for LLMs when a knowledge conflict arises and whether existing LLMs fulfill them. We introduce an evaluation framework for simulating contextual knowledge conflicts.
arXiv Detail & Related papers (2023-10-02T06:57:45Z)
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well. Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries. We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence [37.18100697469402]
We simulate knowledge conflicts where parametric knowledge suggests one answer and different passages suggest different answers. We find retrieval performance heavily impacts which sources models rely on, and current models mostly rely on non-performing knowledge. We present a new calibration study, where models are discouraged from presenting any single answer when presented with multiple conflicting answer candidates.
arXiv Detail & Related papers (2022-10-25T01:46:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.