ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
- URL: http://arxiv.org/abs/2410.11414v1
- Date: Tue, 15 Oct 2024 09:02:09 GMT
- Title: ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
- Authors: Zhongxiang Sun, Xiaoxue Zang, Kai Zheng, Yang Song, Jun Xu, Xiao Zhang, Weijie Yu, Yang Song, Han Li,
- Abstract summary: hallucinations caused by insufficient parametric (internal) knowledge.
Detecting such hallucinations requires disentangling how Large Language Models (LLMs) utilize external and parametric knowledge.
We propose ReDeEP, a novel method that detects hallucinations by decoupling LLM's utilization of external context and parametric knowledge.
- Score: 27.325766792146936
- License:
- Abstract: Retrieval-Augmented Generation (RAG) models are designed to incorporate external knowledge, reducing hallucinations caused by insufficient parametric (internal) knowledge. However, even with accurate and relevant retrieved content, RAG models can still produce hallucinations by generating outputs that conflict with the retrieved information. Detecting such hallucinations requires disentangling how Large Language Models (LLMs) utilize external and parametric knowledge. Current detection methods often focus on one of these mechanisms or without decoupling their intertwined effects, making accurate detection difficult. In this paper, we investigate the internal mechanisms behind hallucinations in RAG scenarios. We discover hallucinations occur when the Knowledge FFNs in LLMs overemphasize parametric knowledge in the residual stream, while Copying Heads fail to effectively retain or integrate external knowledge from retrieved content. Based on these findings, we propose ReDeEP, a novel method that detects hallucinations by decoupling LLM's utilization of external context and parametric knowledge. Our experiments show that ReDeEP significantly improves RAG hallucination detection accuracy. Additionally, we introduce AARF, which mitigates hallucinations by modulating the contributions of Knowledge FFNs and Copying Heads.
Related papers
- REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models [15.380441563675243]
Hallucinations in large language model (LLM) outputs severely limit their reliability in knowledge-intensive tasks such as question answering.
We introduce REFIND, a novel framework that detects hallucinated spans within LLM outputs by directly leveraging retrieved documents.
REFIND demonstrated robustness across nine languages, including low-resource settings, and significantly outperformed baseline models.
arXiv Detail & Related papers (2025-02-19T10:59:05Z) - The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM's Internal States [0.5573267589690007]
We focus on hallucinations involving information not used in training, which we determine by using recency to ensure the information emerged after a cut-off date.
This study investigates these hallucinations by detecting them at sentence level using different internal states of various language models.
Our results show that IAVs detect hallucinations as effectively as CEVs and reveal that answerable and unanswerable prompts are encoded differently as separate classifiers.
arXiv Detail & Related papers (2024-12-22T15:08:24Z) - LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation [3.3762582927663063]
In this paper, we propose LRP4RAG, a method for detecting hallucinations in large language models (LLMs)
To the best of our knowledge, this is the first time that LRP has been used for detecting RAG hallucinations.
arXiv Detail & Related papers (2024-08-28T04:44:43Z) - Mitigating Entity-Level Hallucination in Large Language Models [11.872916697604278]
This paper proposes Dynamic Retrieval Augmentation based on hallucination Detection (DRAD) as a novel method to detect and mitigate hallucinations in Large Language Models (LLMs)
Experiment results show that DRAD demonstrates superior performance in both detecting and mitigating hallucinations in LLMs.
arXiv Detail & Related papers (2024-07-12T16:47:34Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models [68.91592125175787]
Hallucinations pose a significant challenge for the practical implementation of large language models (LLMs)
We present Rowen, a novel approach that enhances LLMs with a selective retrieval augmentation process tailored to address hallucinations.
arXiv Detail & Related papers (2024-02-16T11:55:40Z) - Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs.
We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge.
We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z) - FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs.
FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation.
We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.