Black-Box Hallucination Detection via Consistency Under the Uncertain Expression
- URL: http://arxiv.org/abs/2509.21999v1
- Date: Fri, 26 Sep 2025 07:26:16 GMT
- Title: Black-Box Hallucination Detection via Consistency Under the Uncertain Expression
- Authors: Seongho Joo, Kyungmin Min, Jahyun Koo, Kyomin Jung,
- Abstract summary: Large Language Models (LLMs) are notorious for generating non-factual responses, so-called "hallucination" problems.<n>We propose a simple black-box hallucination detection metric.
- Score: 18.2609070987801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the great advancement of Language modeling in recent days, Large Language Models (LLMs) such as GPT3 are notorious for generating non-factual responses, so-called "hallucination" problems. Existing methods for detecting and alleviating this hallucination problem require external resources or the internal state of LLMs, such as the output probability of each token. Given the LLM's restricted external API availability and the limited scope of external resources, there is an urgent demand to establish the Black-Box approach as the cornerstone for effective hallucination detection. In this work, we propose a simple black-box hallucination detection metric after the investigation of the behavior of LLMs under expression of uncertainty. Our comprehensive analysis reveals that LLMs generate consistent responses when they present factual responses while non-consistent responses vice versa. Based on the analysis, we propose an efficient black-box hallucination detection metric with the expression of uncertainty. The experiment demonstrates that our metric is more predictive of the factuality in model responses than baselines that use internal knowledge of LLMs.
Related papers
- HalluMat: Detecting Hallucinations in LLM-Generated Materials Science Content Through Multi-Stage Verification [0.9490124006642771]
HalluMatData is a benchmark dataset for evaluating hallucination detection methods.<n>HalluMatDetector is a multi-stage hallucination detection framework.<n>HalluMatDetector reduces hallucination verification rates by 30%.
arXiv Detail & Related papers (2025-12-26T22:16:12Z) - Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models [0.0]
We propose Counterfactual Probing, a novel approach for detecting and mitigating hallucinations in large language models.<n>Our method dynamically generates counterfactual statements that appear plausible but contain subtle factual errors, then evaluates the model's sensitivity to these perturbations.
arXiv Detail & Related papers (2025-08-03T17:29:48Z) - MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM [58.2298313720146]
Multimodal hallucinations are multi-sourced and arise from diverse causes.<n>Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations.
arXiv Detail & Related papers (2025-05-30T05:54:36Z) - SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection [10.54378596443678]
Large language models (LLMs) are highly capable but face latency challenges in real-time applications.
This study optimize the real-time interpretable hallucination detection by introducing effective prompting techniques.
arXiv Detail & Related papers (2024-08-22T22:13:13Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - LLM Internal States Reveal Hallucination Risk Faced With a Query [62.29558761326031]
Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries.
This paper investigates whether Large Language Models can estimate their own hallucination risk before response generation.
By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32% at run time.
arXiv Detail & Related papers (2024-07-03T17:08:52Z) - Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models [68.91592125175787]
Hallucinations pose a significant challenge for the practical implementation of large language models (LLMs)
We present Rowen, a novel approach that enhances LLMs with a selective retrieval augmentation process tailored to address hallucinations.
arXiv Detail & Related papers (2024-02-16T11:55:40Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.