Streaming Hallucination Detection in Long Chain-of-Thought Reasoning
- URL: http://arxiv.org/abs/2601.02170v1
- Date: Mon, 05 Jan 2026 14:47:41 GMT
- Title: Streaming Hallucination Detection in Long Chain-of-Thought Reasoning
- Authors: Haolang Lu, Minghui Pan, Ripeng Li, Guoshun Nan, Jialin Zhuang, Zijie Zhao, Zhongxiang Sun, Kun Wang, Yang Liu,
- Abstract summary: hallucination in long CoT reasoning is better understood as an evolving latent state rather than a one-off erroneous event.<n>Our approach enables streaming hallucination detection in long CoT reasoning, providing real-time, interpretable evidence.
- Score: 12.874780445960424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long chain-of-thought (CoT) reasoning improves the performance of large language models, yet hallucinations in such settings often emerge subtly and propagate across reasoning steps. We suggest that hallucination in long CoT reasoning is better understood as an evolving latent state rather than a one-off erroneous event. Accordingly, we treat step-level hallucination judgments as local observations and introduce a cumulative prefix-level hallucination signal that tracks the global evolution of the reasoning state over the entire trajectory. Overall, our approach enables streaming hallucination detection in long CoT reasoning, providing real-time, interpretable evidence.
Related papers
- Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context [34.903722603279014]
Large Vision-Language Models (LVLMs) have made significant progress in recent years but are prone to hallucination issues.<n>In this paper, we ask: Does increased hallucination result solely from length-induced errors, or is there a deeper underlying mechanism?<n>We propose a novel "induce-detect-suppress" framework that actively induces hallucinations through deliberately designed contexts.
arXiv Detail & Related papers (2025-10-23T05:22:07Z) - Review of Hallucination Understanding in Large Language and Vision Models [65.29139004945712]
We present a framework for characterizing both image and text hallucinations across diverse applications.<n>Our investigations reveal that hallucinations often stem from predictable patterns in data distributions and inherited biases.<n>This survey provides a foundation for developing more robust and effective solutions to hallucinations in real-world generative AI systems.
arXiv Detail & Related papers (2025-09-26T09:23:08Z) - Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet [93.00109641811788]
Test-time scaling increases inference-time computation by allowing models to generate long reasoning chains.<n>We show that this approach is not yet effective for knowledge-intensive tasks, where high factual accuracy and low hallucination rates are essential.<n>Our results reveal that increasing test-time computation does not consistently improve accuracy and, in many cases, it even leads to more hallucinations.
arXiv Detail & Related papers (2025-09-08T16:28:25Z) - Two Causes, Not One: Rethinking Omission and Fabrication Hallucinations in MLLMs [31.601057368065877]
Existing methods, based on the flawed assumption that omission and fabrication hallucinations share a common cause, often reduce omissions only to trigger more fabrications.<n>In this work, we overturn this view by demonstrating that omission hallucinations arise from insufficient confidence when mapping perceived visual features to linguistic expressions.<n>We propose the Visual-Semantic Attention Potential Field, a conceptual framework that reveals how visual evidence to infer the presence or absence of objects.
arXiv Detail & Related papers (2025-08-30T05:47:41Z) - Are Reasoning Models More Prone to Hallucination? [70.04436965009072]
Recently evolved large reasoning models (LRMs) show powerful performance in solving complex tasks with long chain-of-thought (CoT) reasoning capability.<n>Are reasoning models more prone to hallucination?<n>This paper addresses the question from three perspectives.
arXiv Detail & Related papers (2025-05-29T16:53:41Z) - Auditing Meta-Cognitive Hallucinations in Reasoning Large Language Models [12.747507415841168]
We study the causality of hallucinations under constrained knowledge domains by auditing the Chain-of-Thought trajectory.<n>Our analysis reveals that in long-CoT settings, RLLMs can iteratively reinforce biases and errors through flawed reflective reasoning.<n>Surprisingly, even direct interventions at the origin of hallucinations often fail to reverse their effects, as reasoning chains exhibit 'chain disloyalty'
arXiv Detail & Related papers (2025-05-19T14:11:09Z) - Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective [11.013059864022667]
Reasoning Hallucinations are logically coherent but factually incorrect reasoning traces.<n>These errors are embedded within structured reasoning, making them more difficult to detect and potentially more harmful.<n>We propose the Reasoning Score, which quantifies the depth of reasoning by measuring the divergence between logits.<n>We also introduce GRPO-R, an enhanced reinforcement learning algorithm that incorporates step-level deep reasoning rewards via potential-based shaping.
arXiv Detail & Related papers (2025-05-19T09:16:40Z) - Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations.<n>Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones.<n>We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z) - Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer [51.7407540261676]
We investigate a distinct type of hallucination, where a model can consistently answer a question correctly, but a seemingly trivial perturbation causes it to produce a hallucinated response with high certainty.<n>This phenomenon is particularly concerning in high-stakes domains such as medicine or law, where model certainty is often used as a proxy for reliability.<n>We show that CHOKE examples are consistent across prompts, occur in different models and datasets, and are fundamentally distinct from other hallucinations.
arXiv Detail & Related papers (2025-02-18T15:46:31Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - In-Context Sharpness as Alerts: An Inner Representation Perspective for
Hallucination Mitigation [36.31646727970656]
Large language models (LLMs) frequently hallucinate and produce factual errors.
correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones.
We propose an entropy-based metric to quantify the sharpness'' among the in-context hidden states and incorporate it into the decoding process.
arXiv Detail & Related papers (2024-03-03T15:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.