Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
- URL: http://arxiv.org/abs/2311.13230v1
- Date: Wed, 22 Nov 2023 08:39:17 GMT
- Title: Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
- Authors: Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng
Zhang, Chenghu Zhou, Xinbing Wang and Luoyi Fu
- Abstract summary: Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
- Score: 99.33091772494751
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) have gained significant popularity for their
impressive performance across diverse fields. However, LLMs are prone to
hallucinate untruthful or nonsensical outputs that fail to meet user
expectations in many real-world applications. Existing works for detecting
hallucinations in LLMs either rely on external knowledge for reference
retrieval or require sampling multiple responses from the LLM for consistency
verification, making these methods costly and inefficient. In this paper, we
propose a novel reference-free, uncertainty-based method for detecting
hallucinations in LLMs. Our approach imitates human focus in factuality
checking from three aspects: 1) focus on the most informative and important
keywords in the given text; 2) focus on the unreliable tokens in historical
context which may lead to a cascade of hallucinations; and 3) focus on the
token properties such as token type and token frequency. Experimental results
on relevant datasets demonstrate the effectiveness of our proposed method,
which achieves state-of-the-art performance across all the evaluation metrics
and eliminates the need for additional information.
Related papers
- Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models [0.0]
Large Language Models (LLMs) are powerful computational models trained on extensive corpora of human-readable text, enabling them to perform general-purpose language understanding and generation.
Despite these successes, LLMs often produce inaccuracies, commonly referred to as hallucinations.
This paper provides an empirical evaluation of different prompting strategies and frameworks aimed at reducing hallucinations in LLMs.
arXiv Detail & Related papers (2024-10-25T08:34:53Z) - Reference-free Hallucination Detection for Large Vision-Language Models [19.36348897433261]
Large vision-language models (LVLMs) have made significant progress in recent years.
LVLMs exhibit excellent ability in language understanding, question answering, and conversations of visual inputs.
They are prone to producing hallucinations.
Several methods are proposed to evaluate the hallucinations in LVLMs, most are reference-based and depend on external tools.
arXiv Detail & Related papers (2024-08-11T13:17:14Z) - Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach [0.0]
Large Language Models (LLMs) produce inaccurate outputs, also known as hallucinations.
This paper introduces a supervised learning approach employing only four numerical features derived from tokens and vocabulary probabilities obtained from other evaluators.
The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks.
arXiv Detail & Related papers (2024-05-30T03:00:47Z) - KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking [55.2155025063668]
KnowHalu is a novel approach for detecting hallucinations in text generated by large language models (LLMs)
It uses step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism.
Our evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks.
arXiv Detail & Related papers (2024-04-03T02:52:07Z) - Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing.
Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image.
To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z) - INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection [39.52923659121416]
We propose to explore the dense semantic information retained within textbfINternal textbfStates for halluctextbfInation textbfDEtection.
A simple yet effective textbfEigenScore metric is proposed to better evaluate responses' self-consistency.
A test time feature clipping approach is explored to truncate extreme activations in the internal states.
arXiv Detail & Related papers (2024-02-06T06:23:12Z) - Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs.
We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge.
We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z) - Chainpoll: A high efficacy method for LLM hallucination detection [0.0]
We introduce ChainPoll, an innovative hallucination detection method that excels compared to its counterparts.
We also unveil RealHall, a refined collection of benchmark datasets to assess hallucination detection metrics from recent studies.
arXiv Detail & Related papers (2023-10-22T14:45:14Z) - FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs.
FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation.
We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.