Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
- URL: http://arxiv.org/abs/2403.18167v2
- Date: Mon, 17 Jun 2024 21:35:41 GMT
- Title: Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
- Authors: Lei Yu, Meng Cao, Jackie Chi Kit Cheung, Yue Dong,
- Abstract summary: State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge.
We create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations.
- Score: 42.46721214112836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. To explore the mechanistic causes of these hallucinations, we create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations. We discover two general and distinct mechanistic causes of hallucinations shared across LMs (Llama-2, Pythia, GPT-J): 1) knowledge enrichment hallucinations: insufficient subject attribute knowledge in lower layer MLPs, and 2) answer extraction hallucinations: failure to select the correct object attribute in upper layer attention heads. We also found these two internal mechanistic causes of hallucinations are reflected in external manifestations. Based on insights from our mechanistic analysis, we propose a novel hallucination mitigation method through targeted restoration of the LM's internal fact recall pipeline, demonstrating superior performance compared to baselines.
Related papers
- Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models [22.42712853647949]
We present an in-depth investigation into the object hallucination problem specifically within the CLIP model.
We unveil that even in isolation, the CLIP model is prone to object hallucinations, suggesting that the hallucination problem is not solely due to the interaction between vision and language modalities.
We show the the enhanced model can be employed as a visual encoder, effectively alleviating the object hallucination issue in LVLMs.
arXiv Detail & Related papers (2024-10-04T06:24:49Z) - Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate [34.17353224636788]
We argue that hallucination in MLLMs is partially due to a lack of slow-thinking and divergent-thinking in these models.
Our approach can not only hallucinations but also interpret why they occur and detail the specifics of hallucination.
arXiv Detail & Related papers (2024-07-30T02:41:32Z) - VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models [59.05674402770661]
This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
VideoHallucer categorizes hallucinations into two main types: intrinsic and extrinsic, offering further subcategories for detailed analysis.
arXiv Detail & Related papers (2024-06-24T06:21:59Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - The Dawn After the Dark: An Empirical Study on Factuality Hallucination
in Large Language Models [134.6697160940223]
hallucination poses great challenge to trustworthy and reliable deployment of large language models.
Three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them.
This work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation.
arXiv Detail & Related papers (2024-01-06T12:40:45Z) - On Early Detection of Hallucinations in Factual Question Answering [4.76359068115052]
hallucinations remain a major impediment towards gaining user trust.
In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations.
Our results show that the distributions of these artifacts tend to differ between hallucinated and non-hallucinated generations.
arXiv Detail & Related papers (2023-12-19T14:35:04Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z) - Understanding and Detecting Hallucinations in Neural Machine Translation
via Model Introspection [28.445196622710164]
We first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations.
We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector.
arXiv Detail & Related papers (2023-01-18T20:43:13Z) - The Curious Case of Hallucinations in Neural Machine Translation [5.3180458405676205]
hallucinations in Neural Machine Translation lie at an extreme end on the spectrum of NMT pathologies.
We consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations could be generated and explained through specific corpus-level noise patterns.
We elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation.
arXiv Detail & Related papers (2021-04-14T08:09:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.