Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation
- URL: http://arxiv.org/abs/2506.17088v2
- Date: Thu, 17 Jul 2025 00:41:37 GMT
- Title: Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation
- Authors: Jiahao Cheng, Tiancheng Su, Jia Yuan, Guoxiu He, Jiawei Liu, Xinqi Tao, Jingwen Xie, Huaxia Li,
- Abstract summary: Chain-of-Thought (CoT) prompting can mitigate hallucinations by encouraging step-by-step reasoning.<n>Our study highlights an overlooked trade-off in the use of reasoning.
- Score: 4.265468175793552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) often exhibit \textit{hallucinations}, generating factually incorrect or semantically irrelevant content in response to prompts. Chain-of-Thought (CoT) prompting can mitigate hallucinations by encouraging step-by-step reasoning, but its impact on hallucination detection remains underexplored. To bridge this gap, we conduct a systematic empirical evaluation. We begin with a pilot experiment, revealing that CoT reasoning significantly affects the LLM's internal states and token probability distributions. Building on this, we evaluate the impact of various CoT prompting methods on mainstream hallucination detection methods across both instruction-tuned and reasoning-oriented LLMs. Specifically, we examine three key dimensions: changes in hallucination score distributions, variations in detection accuracy, and shifts in detection confidence. Our findings show that while CoT prompting helps reduce hallucination frequency, it also tends to obscure critical signals used for detection, impairing the effectiveness of various detection methods. Our study highlights an overlooked trade-off in the use of reasoning. Code is publicly available at: https://anonymous.4open.science/r/cot-hallu-detect.
Related papers
- Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models [0.0]
We propose Counterfactual Probing, a novel approach for detecting and mitigating hallucinations in large language models.<n>Our method dynamically generates counterfactual statements that appear plausible but contain subtle factual errors, then evaluates the model's sensitivity to these perturbations.
arXiv Detail & Related papers (2025-08-03T17:29:48Z) - Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs [47.18623962083962]
We present a novel approach for detecting hallucinations in large language models.<n>We find that hallucinated responses exhibit smaller deviations from their prompts compared to grounded responses.<n>We propose a model-intrinsic detection method that uses distributional distances as principled hallucination scores.
arXiv Detail & Related papers (2025-06-11T15:59:15Z) - MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM [58.2298313720146]
Multimodal hallucinations are multi-sourced and arise from diverse causes.<n>Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations.
arXiv Detail & Related papers (2025-05-30T05:54:36Z) - Auditing Meta-Cognitive Hallucinations in Reasoning Large Language Models [8.97308732968526]
We study the causality of hallucinations under constrained knowledge domains by auditing the Chain-of-Thought trajectory.<n>Our analysis reveals that in long-CoT settings, RLLMs can iteratively reinforce biases and errors through flawed reflective reasoning.<n>Surprisingly, even direct interventions at the origin of hallucinations often fail to reverse their effects.
arXiv Detail & Related papers (2025-05-19T14:11:09Z) - Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations.<n>Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones.<n>We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z) - Can Your Uncertainty Scores Detect Hallucinated Entity? [14.432545893757677]
We propose a new data set, HalluEntity, which annotates hallucination at the entity level.<n>Based on the dataset, we evaluate uncertainty-based hallucination detection approaches across 17 modern LLMs.<n>Our experimental results show that uncertainty estimation approaches focusing on individual token probabilities tend to over-predict hallucinations.
arXiv Detail & Related papers (2025-02-17T16:01:41Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - In-Context Sharpness as Alerts: An Inner Representation Perspective for
Hallucination Mitigation [36.31646727970656]
Large language models (LLMs) frequently hallucinate and produce factual errors.
correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones.
We propose an entropy-based metric to quantify the sharpness'' among the in-context hidden states and incorporate it into the decoding process.
arXiv Detail & Related papers (2024-03-03T15:53:41Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.