Token Level Hallucination Detection via Variance in Language Models
- URL: http://arxiv.org/abs/2507.04137v1
- Date: Sat, 05 Jul 2025 19:20:59 GMT
- Title: Token Level Hallucination Detection via Variance in Language Models
- Authors: Keshav Kumar,
- Abstract summary: Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations.<n>We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple generations.<n>Our approach is model-agnostic, interpretable, and suited for real-time or post-hoc analysis.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations, confidently generated yet factually incorrect outputs. We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple stochastic generations. Unlike prior methods that require ground-truth references or sentence-level verification, our approach is model-agnostic, interpretable, and suited for real-time or post-hoc analysis. We evaluate our method on unanswerable question prompts from the SQuAD v2 dataset and benchmark across three autoregressive models of varying scales: GPT-Neo 125M, Falcon 1B, and Mistral 7B. Through both quantitative metrics and visual diagnostics, we show that token-level variance reliably highlights instability in model outputs and correlates with hallucination patterns. Our framework is lightweight, reproducible, and adaptable to multiple domains, offering a valuable diagnostic tool for analyzing generative reliability in LLMs.
Related papers
- Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models [0.0]
We propose Counterfactual Probing, a novel approach for detecting and mitigating hallucinations in large language models.<n>Our method dynamically generates counterfactual statements that appear plausible but contain subtle factual errors, then evaluates the model's sensitivity to these perturbations.
arXiv Detail & Related papers (2025-08-03T17:29:48Z) - HalluCounter: Reference-free LLM Hallucination Detection in the Wild! [6.5037356041929675]
HalluCounter is a reference-free hallucination detection method that utilizes both response-response and query-response consistency and alignment patterns.<n>Our method outperforms state-of-the-art approaches by a significant margin, achieving over 90% average confidence in hallucination detection across datasets.
arXiv Detail & Related papers (2025-03-06T16:59:18Z) - Spatial Reasoning with Denoising Models [49.83744014336816]
We introduce a framework to perform reasoning over sets of continuous variables via denoising generative models.<n>For the first time, that order of generation can successfully be predicted by the denoising network itself.<n>Using these findings, we can increase the accuracy of specific reasoning tasks from 1% to >50%.
arXiv Detail & Related papers (2025-02-28T14:08:30Z) - HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses [0.12499537119440242]
This paper proposes an explanation enhanced hallucination-detection model, coined as HuDEx.<n>The proposed model provides a novel approach to integrate detection with explanations, and enable both users and the LLM itself to understand and reduce errors.
arXiv Detail & Related papers (2025-02-12T04:17:02Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - MOSAIC: Multiple Observers Spotting AI Content [35.67613230687864]
Large Language Models (LLMs) are trained at scale and endowed with powerful text-generating abilities.<n>We propose an approach to automatically discriminate artificially generated from human-written texts.<n>Our experiments, conducted with various generator LLMs, indicate that this approach effectively leverages the strengths of each model.
arXiv Detail & Related papers (2024-09-11T20:55:12Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics [51.17512229589]
PoLLMgraph is a model-based white-box detection and forecasting approach for large language models.
We show that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics.
Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
arXiv Detail & Related papers (2024-04-06T20:02:20Z) - SAC3: Reliable Hallucination Detection in Black-Box Language Models via
Semantic-aware Cross-check Consistency [11.056236593022978]
Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs)
We re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level.
We propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking.
arXiv Detail & Related papers (2023-11-03T06:32:43Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.