INSIDE: LLMs' Internal States Retain the Power of Hallucination
Detection
- URL: http://arxiv.org/abs/2402.03744v1
- Date: Tue, 6 Feb 2024 06:23:12 GMT
- Title: INSIDE: LLMs' Internal States Retain the Power of Hallucination
Detection
- Authors: Chao Chen, Kai Liu, Ze Chen, Yi Gu, Yue Wu, Mingyuan Tao, Zhihang Fu,
Jieping Ye
- Abstract summary: We propose to explore the dense semantic information retained within textbfINternal textbfStates for halluctextbfInation textbfDEtection.
A simple yet effective textbfEigenScore metric is proposed to better evaluate responses' self-consistency.
A test time feature clipping approach is explored to truncate extreme activations in the internal states.
- Score: 41.23176896032034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge hallucination have raised widespread concerns for the security and
reliability of deployed LLMs. Previous efforts in detecting hallucinations have
been employed at logit-level uncertainty estimation or language-level
self-consistency evaluation, where the semantic information is inevitably lost
during the token-decoding procedure. Thus, we propose to explore the dense
semantic information retained within LLMs' \textbf{IN}ternal \textbf{S}tates
for halluc\textbf{I}nation \textbf{DE}tection (\textbf{INSIDE}). In particular,
a simple yet effective \textbf{EigenScore} metric is proposed to better
evaluate responses' self-consistency, which exploits the eigenvalues of
responses' covariance matrix to measure the semantic consistency/diversity in
the dense embedding space. Furthermore, from the perspective of self-consistent
hallucination detection, a test time feature clipping approach is explored to
truncate extreme activations in the internal states, which reduces
overconfident generations and potentially benefits the detection of
overconfident hallucinations. Extensive experiments and ablation studies are
performed on several popular LLMs and question-answering (QA) benchmarks,
showing the effectiveness of our proposal.
Related papers
- VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models [12.27217471495276]
Hallucinations in large language models (LLMs) produce responses that are coherent but factually inaccurate.
We present MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection.
We also present HELM, a new benchmark for evaluating hallucination detection across multiple LLMs.
arXiv Detail & Related papers (2024-03-11T05:51:03Z) - HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination
Tendency of LLMs [0.0]
Hallucinations pose a significant challenge to the reliability and alignment of Large Language Models (LLMs)
This paper introduces an automated scalable framework that combines benchmarking LLMs' hallucination tendencies with efficient hallucination detection.
The framework is domain-agnostic, allowing the use of any language model for benchmark creation or evaluation in any domain.
arXiv Detail & Related papers (2024-02-25T22:23:37Z) - Retrieve Only When It Needs: Adaptive Retrieval Augmentation for
Hallucination Mitigation in Large Language Models [73.93616728895401]
Hallucinations pose a significant challenge for the practical implementation of large language models (LLMs)
We present Rowen, a novel approach that enhances LLMs with a selective retrieval augmentation process tailored to address hallucinations.
arXiv Detail & Related papers (2024-02-16T11:55:40Z) - Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs.
We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge.
We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.