TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models
- URL: http://arxiv.org/abs/2510.01274v1
- Date: Tue, 30 Sep 2025 02:01:10 GMT
- Title: TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models
- Authors: Shenxu Chang, Junchi Yu, Weixing Wang, Yongqiang Chen, Jialin Yu, Philip Torr, Jindong Gu,
- Abstract summary: hallucination problem in D-LLMs remains underexplored, limiting their reliability in real-world applications.<n>Existing hallucination detection methods are designed for AR-LLMs and rely on signals from single-step generation.<n>We propose TraceDet, a novel framework that explicitly leverages the intermediate denoising steps of D-LLMs for hallucination detection.
- Score: 49.83690850047884
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion large language models (D-LLMs) have recently emerged as a promising alternative to auto-regressive LLMs (AR-LLMs). However, the hallucination problem in D-LLMs remains underexplored, limiting their reliability in real-world applications. Existing hallucination detection methods are designed for AR-LLMs and rely on signals from single-step generation, making them ill-suited for D-LLMs where hallucination signals often emerge throughout the multi-step denoising process. To bridge this gap, we propose TraceDet, a novel framework that explicitly leverages the intermediate denoising steps of D-LLMs for hallucination detection. TraceDet models the denoising process as an action trace, with each action defined as the model's prediction over the cleaned response, conditioned on the previous intermediate output. By identifying the sub-trace that is maximally informative to the hallucinated responses, TraceDet leverages the key hallucination signals in the multi-step denoising process of D-LLMs for hallucination detection. Extensive experiments on various open source D-LLMs demonstrate that TraceDet consistently improves hallucination detection, achieving an average gain in AUROC of 15.2% compared to baselines.
Related papers
- TDGNet: Hallucination Detection in Diffusion Language Models via Temporal Dynamic Graphs [30.313604786976715]
Diffusion language models (D-LLMs) offer parallel denoising and bidirectional context.<n> hallucination detection for D-LLMs remains underexplored.<n>We introduce TDGNet, a temporal dynamic graph framework that formulates hallucination detection as learning over evolving token-level attention graphs.
arXiv Detail & Related papers (2026-02-08T16:35:30Z) - Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models [4.7341002297388295]
Implicit feedback, employed in training recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias.<n>Previous studies have attempted to identify noisy samples through their diverged data patterns, such as higher loss values.<n>We observed that noisy samples and hard samples display similar patterns, leading to hard-noisy confusion issue.
arXiv Detail & Related papers (2025-11-10T16:51:03Z) - LLM Hallucination Detection: A Fast Fourier Transform Method Based on Hidden Layer Temporal Signals [10.85580316542761]
Hallucination remains a critical barrier for deploying large language models (LLMs) in reliability-sensitive applications.<n>We propose HSAD (Hidden Signal Analysis-based Detection), a novel hallucination detection framework that models the temporal dynamics of hidden representations.<n>Across multiple benchmarks, including TruthfulQA, HSAD achieves over 10 percentage points improvement compared to prior state-of-the-art methods.
arXiv Detail & Related papers (2025-09-16T15:08:19Z) - Cross-Layer Attention Probing for Fine-Grained Hallucination Detection [6.83291363146574]
We propose Cross-Layer Attention Probing (CLAP), a novel activation probing technique for hallucination detection.<n>Our empirical evaluations show that CLAP improves hallucination detection compared to baselines on both decoded responses and responses sampled at higher temperatures.<n>CLAP maintains high reliability even when applied out-of-distribution.
arXiv Detail & Related papers (2025-09-04T14:37:34Z) - Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling [67.14942827452161]
Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations.<n>In this work, we introduce REVERSE, a unified framework that integrates hallucination-aware training with on-the-fly self-verification.
arXiv Detail & Related papers (2025-04-17T17:59:22Z) - HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses [0.12499537119440242]
This paper proposes an explanation enhanced hallucination-detection model, coined as HuDEx.<n>The proposed model provides a novel approach to integrate detection with explanations, and enable both users and the LLM itself to understand and reduce errors.
arXiv Detail & Related papers (2025-02-12T04:17:02Z) - ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models [65.12177400764506]
Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications.<n>Current hallucination detection and mitigation datasets are limited in domains and sizes.<n>This paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset.
arXiv Detail & Related papers (2024-07-05T17:56:38Z) - PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics [51.17512229589]
PoLLMgraph is a model-based white-box detection and forecasting approach for large language models.
We show that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics.
Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
arXiv Detail & Related papers (2024-04-06T20:02:20Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.