Related papers: Semantic Reformulation Entropy for Robust Hallucination Detection in QA Tasks

Semantic Reformulation Entropy for Robust Hallucination Detection in QA Tasks

URL: http://arxiv.org/abs/2509.17445v2
Date: Thu, 25 Sep 2025 02:28:29 GMT
Title: Semantic Reformulation Entropy for Robust Hallucination Detection in QA Tasks
Authors: Chaodong Tong, Qi Zhang, Lei Jiang, Yanbing Liu, Nannan Sun, Wei Li,
Abstract summary: Existing entropy-based semantic-level uncertainty estimation methods are limited by sampling noise and unstable clustering of variable-length answers.<n>We propose Semantic Reformulation Entropy (SRE), which improves uncertainty estimation in two ways.
Score: 13.230578301939907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliable question answering with large language models (LLMs) is challenged by hallucinations, fluent but factually incorrect outputs arising from epistemic uncertainty. Existing entropy-based semantic-level uncertainty estimation methods are limited by sampling noise and unstable clustering of variable-length answers. We propose Semantic Reformulation Entropy (SRE), which improves uncertainty estimation in two ways. First, input-side semantic reformulations produce faithful paraphrases, expand the estimation space, and reduce biases from superficial decoder tendencies. Second, progressive, energy-based hybrid clustering stabilizes semantic grouping. Experiments on SQuAD and TriviaQA show that SRE outperforms strong baselines, providing more robust and generalizable hallucination detection. These results demonstrate that combining input diversification with multi-signal clustering substantially enhances semantic-level uncertainty estimation.

Related papers

Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction [51.50282796099369]
This paper develops a multi-dimensional instruction uncertainty reduction framework to generate semantically constrained adversarial examples.<n>By predicting the language-guided sampling process, the optimization process will be stabilized by the designed ResAdv-DDIM sampler.<n>We realize the reference-free generation of semantically constrained 3D adversarial examples for the first time.
arXiv Detail & Related papers (2025-10-27T04:02:52Z)
Efficient semantic uncertainty quantification in language models via diversity-steered sampling [46.23327887393273]
We introduce a diversity-steered sampler that discourages semantically redundant outputs during decoding.<n>Key idea is to inject a continuous semantic-similarity penalty into the model's proposal distribution.<n>Being modular and requiring no gradient access to the base LLM, the framework promises to serve as a drop-in enhancement for uncertainty estimation.
arXiv Detail & Related papers (2025-10-24T10:06:21Z)
Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation [68.106428321492]
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding.<n>LLMs hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans.<n>We present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately.
arXiv Detail & Related papers (2025-10-09T10:26:58Z)
Semantic Energy: Detecting LLM Hallucination Beyond Entropy [106.92072182161712]
Large Language Models (LLMs) are being increasingly deployed in real-world applications, but they remain susceptible to hallucinations.<n>Uncertainty estimation is a feasible approach to detect such hallucinations.<n>We introduce Semantic Energy, a novel uncertainty estimation framework.
arXiv Detail & Related papers (2025-08-20T07:33:50Z)
Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models [12.743668975795144]
Large language models (LLMs) have transformed natural language processing, but their reliable deployment requires effective uncertainty quantification (UQ)<n>Existing UQ methods are often and lack a probabilistic interpretation.<n>We propose a fully probabilistic framework based on an inverse model, which quantifies uncertainty by evaluating the diversity of the input space conditioned on a given output through systematic perturbations.
arXiv Detail & Related papers (2025-06-11T13:02:17Z)
Enhancing Uncertainty Estimation and Interpretability via Bayesian Non-negative Decision Layer [55.66973223528494]
We develop a Bayesian Non-negative Decision Layer (BNDL), which reformulates deep neural networks as a conditional Bayesian non-negative factor analysis.<n>BNDL can model complex dependencies and provide robust uncertainty estimation.<n>We also offer theoretical guarantees that BNDL can achieve effective disentangled learning.
arXiv Detail & Related papers (2025-05-28T10:23:34Z)
RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection [26.186204911845866]
hallucinations remain a vital obstacle to large language models' trustworthy use.<n>We propose RePPL to recalibrate uncertainty measurement by these two aspects.<n>Our method achieves the best comprehensive detection performance across various QA datasets.
arXiv Detail & Related papers (2025-05-21T11:23:05Z)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z)
Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond [52.246494389096654]
This paper introduces Word-Sequence Entropy (WSE), a method that calibrates uncertainty at both the word and sequence levels. We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs)
arXiv Detail & Related papers (2024-02-22T03:46:08Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.