Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
- URL: http://arxiv.org/abs/2403.18167v2
- Date: Mon, 17 Jun 2024 21:35:41 GMT
- Title: Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
- Authors: Lei Yu, Meng Cao, Jackie Chi Kit Cheung, Yue Dong,
- Abstract summary: State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge.
We create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations.
- Score: 42.46721214112836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. To explore the mechanistic causes of these hallucinations, we create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations. We discover two general and distinct mechanistic causes of hallucinations shared across LMs (Llama-2, Pythia, GPT-J): 1) knowledge enrichment hallucinations: insufficient subject attribute knowledge in lower layer MLPs, and 2) answer extraction hallucinations: failure to select the correct object attribute in upper layer attention heads. We also found these two internal mechanistic causes of hallucinations are reflected in external manifestations. Based on insights from our mechanistic analysis, we propose a novel hallucination mitigation method through targeted restoration of the LM's internal fact recall pipeline, demonstrating superior performance compared to baselines.
Related papers
- Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs [45.13670875211498]
Large Language Models (LLMs) often generate outputs that lack grounding in real-world facts, a phenomenon known as hallucinations.
We show that models can hallucinate with high certainty even when they have the correct knowledge.
arXiv Detail & Related papers (2025-02-18T15:46:31Z) - Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis [14.033320167387194]
A major challenge in their real-world application is hallucination, where LVLMs generate non-existent visual elements, eroding user trust.
We hypothesize that hidden factors, such as objects, contexts, and semantic foreground-background structures, induce hallucination.
By analyzing the causality between images, text prompts, and network saliency, we systematically explore interventions to block these factors.
arXiv Detail & Related papers (2024-12-04T01:23:57Z) - Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate [34.17353224636788]
We argue that hallucination in MLLMs is partially due to a lack of slow-thinking and divergent-thinking in these models.
Our approach can not only hallucinations but also interpret why they occur and detail the specifics of hallucination.
arXiv Detail & Related papers (2024-07-30T02:41:32Z) - VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models [59.05674402770661]
This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
VideoHallucer categorizes hallucinations into two main types: intrinsic and extrinsic, offering further subcategories for detailed analysis.
arXiv Detail & Related papers (2024-06-24T06:21:59Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - The Dawn After the Dark: An Empirical Study on Factuality Hallucination
in Large Language Models [134.6697160940223]
hallucination poses great challenge to trustworthy and reliable deployment of large language models.
Three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them.
This work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation.
arXiv Detail & Related papers (2024-01-06T12:40:45Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z) - A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [40.79317187623401]
The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP)
LLMs are prone to hallucination, generating plausible yet nonfactual content.
This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval systems.
arXiv Detail & Related papers (2023-11-09T09:25:37Z) - Understanding and Detecting Hallucinations in Neural Machine Translation
via Model Introspection [28.445196622710164]
We first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations.
We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector.
arXiv Detail & Related papers (2023-01-18T20:43:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.