Related papers: Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis

Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis

URL: http://arxiv.org/abs/2505.14406v4
Date: Tue, 09 Sep 2025 05:59:48 GMT
Title: Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis
Authors: Haoming Huang, Yibo Yan, Jiahao Huo, Xin Zou, Xinfeng Li, Kun Wang, Xuming Hu,
Abstract summary: Large Language Models (LLMs) are hampered by hallucinations.<n> knowledge overshadowing occurs when one piece of activated knowledge inadvertently masks another relevant piece.<n>We introduce PhantomCircuit, a framework designed to comprehensively analyze and detect knowledge overshadowing.
Score: 41.744536797387376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs), despite their remarkable capabilities, are hampered by hallucinations. A particularly challenging variant, knowledge overshadowing, occurs when one piece of activated knowledge inadvertently masks another relevant piece, leading to erroneous outputs even with high-quality training data. Current understanding of overshadowing is largely confined to inference-time observations, lacking deep insights into its origins and internal mechanisms during model training. Therefore, we introduce PhantomCircuit, a novel framework designed to comprehensively analyze and detect knowledge overshadowing. By innovatively employing knowledge circuit analysis, PhantomCircuit dissects the function of key components in the circuit and how the attention pattern dynamics contribute to the overshadowing phenomenon and its evolution throughout the training process. Extensive experiments demonstrate PhantomCircuit's effectiveness in identifying such instances, offering novel insights into this elusive hallucination and providing the research community with a new methodological lens for its potential mitigation.

Related papers

Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis, Solution, and Interpretation [41.83870063693278]
Previous studies show that introducing new knowledge during large language models (LLMs) fine-tuning can lead to the generation of erroneous output when tested on known information.<n>We conduct a fine-grained analysis across multiple knowledge types and two task types, including knowledge question answering (QA) and knowledge reasoning tasks.<n>We find that when fine-tuned on a dataset in which a specific knowledge type consists entirely of new knowledge, LLMs exhibit significantly increased hallucinations tendencies.<n>We propose KnownPatch, which patches a small number of known knowledge samples in the later stages of training, effectively alleviating new-knowledge-induced hallucinations
arXiv Detail & Related papers (2025-11-04T14:55:24Z)
The emergence of sparse attention: impact of data distribution and benefits of repetition [14.652502263025882]
We study the emergence over training of sparse attention, a critical and frequently observed attention pattern in Transformers.<n>By combining theoretical analysis of a toy model with empirical observations on small Transformers trained on a linear regression variant, we uncover the mechanics sparse attention emergence.<n>Our findings provide a simple, theoretically grounded framework for understanding how data distributions and model design influence the learning dynamics behind one form of emergence.
arXiv Detail & Related papers (2025-05-23T13:14:02Z)
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training [92.88889953768455]
Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge.<n>We identify computational subgraphs that facilitate knowledge storage and processing.
arXiv Detail & Related papers (2025-02-16T16:55:43Z)
Explaining Deep Neural Networks by Leveraging Intrinsic Methods [0.9790236766474201]
This thesis contributes to the field of eXplainable AI, focusing on enhancing the interpretability of deep neural networks. The core contributions lie in introducing novel techniques aimed at making these networks more interpretable by leveraging an analysis of their inner workings. Secondly, this research delves into novel investigations on neurons within trained deep neural networks, shedding light on overlooked phenomena related to their activation values.
arXiv Detail & Related papers (2024-07-17T01:20:17Z)
Knowledge Circuits in Pretrained Transformers [47.342682123081204]
The inner workings of how modern large language models store knowledge have long been a subject of intense interest and investigation among researchers.<n>In this paper, we delve into the graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge.<n>We evaluate the impact of current knowledge editing techniques on these knowledge circuits, providing deeper insights into the functioning and constraints of these editing methodologies.
arXiv Detail & Related papers (2024-05-28T08:56:33Z)
Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer [42.081311699224585]
The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge. Our objective is to explore the existence of hallucinations arising from parametric memory when contextual knowledge undergoes changes.
arXiv Detail & Related papers (2024-02-22T12:26:07Z)
Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
Towards Mitigating Hallucination in Large Language Models via Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z)
Critical Learning Periods for Multisensory Integration in Deep Networks [112.40005682521638]
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations.
arXiv Detail & Related papers (2022-10-06T23:50:38Z)
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA. We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning. Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.