A novel hallucination classification framework
- URL: http://arxiv.org/abs/2510.05189v1
- Date: Mon, 06 Oct 2025 09:54:20 GMT
- Title: A novel hallucination classification framework
- Authors: Maksym Zavhorodnii, Dmytro Dehtiarov, Anna Konovalenko,
- Abstract summary: This work introduces a novel methodology for the automatic detection of hallucinations generated during large language model (LLM) inference.<n>The proposed approach is based on a systematic taxonomy and controlled reproduction of diverse hallucination types through prompt engineering.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work introduces a novel methodology for the automatic detection of hallucinations generated during large language model (LLM) inference. The proposed approach is based on a systematic taxonomy and controlled reproduction of diverse hallucination types through prompt engineering. A dedicated hallucination dataset is subsequently mapped into a vector space using an embedding model and analyzed with unsupervised learning techniques in a reduced-dimensional representation of hallucinations with veridical responses. Quantitative evaluation of inter-centroid distances reveals a consistent correlation between the severity of informational distortion in hallucinations and their spatial divergence from the cluster of correct outputs. These findings provide theoretical and empirical evidence that even simple classification algorithms can reliably distinguish hallucinations from accurate responses within a single LLM, thereby offering a lightweight yet effective framework for improving model reliability.
Related papers
- Revisiting Hallucination Detection with Effective Rank-based Uncertainty [10.775061161282053]
We propose a simple yet powerful method that quantifies uncertainty by measuring the effective rank of hidden states.<n>Grounded in the spectral analysis of representations, our approach provides interpretable insights into the model's internal reasoning process.<n>Our method effectively detects hallucinations and generalizes robustly across various scenarios.
arXiv Detail & Related papers (2025-10-09T16:12:12Z) - A Novel Differential Feature Learning for Effective Hallucination Detection and Classification [3.9060143123877844]
We propose a dual-model architecture integrating a Projected Fusion block for adaptive inter-layer feature weighting and a Differential Feature Learning mechanism.<n>We demonstrate that hallucination signals concentrate in highly sparse feature subsets, achieving significant accuracy improvements on question answering and dialogue tasks.
arXiv Detail & Related papers (2025-09-20T06:48:22Z) - Theoretical Foundations and Mitigation of Hallucination in Large Language Models [0.0]
Hallucination in Large Language Models (LLMs) refers to the generation of content that is not faithful to the input or the real-world facts.<n>This paper provides a rigorous treatment of hallucination in LLMs, including formal definitions and theoretical analyses.
arXiv Detail & Related papers (2025-07-20T15:22:34Z) - Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs [47.18623962083962]
We present a novel approach for detecting hallucinations in large language models.<n>We find that hallucinated responses exhibit smaller deviations from their prompts compared to grounded responses.<n>We propose a model-intrinsic detection method that uses distributional distances as principled hallucination scores.
arXiv Detail & Related papers (2025-06-11T15:59:15Z) - MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM [58.2298313720146]
Multimodal hallucinations are multi-sourced and arise from diverse causes.<n>Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations.
arXiv Detail & Related papers (2025-05-30T05:54:36Z) - HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination"<n>This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z) - Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations.<n>Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones.<n>We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z) - Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization [123.54980913741828]
Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data.<n>They invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images.<n>Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information.<n>However, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations.
arXiv Detail & Related papers (2024-05-24T08:46:31Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.