A Novel Differential Feature Learning for Effective Hallucination Detection and Classification
- URL: http://arxiv.org/abs/2509.21357v1
- Date: Sat, 20 Sep 2025 06:48:22 GMT
- Title: A Novel Differential Feature Learning for Effective Hallucination Detection and Classification
- Authors: Wenkai Wang, Vincent Lee, Yizhen Zheng,
- Abstract summary: We propose a dual-model architecture integrating a Projected Fusion block for adaptive inter-layer feature weighting and a Differential Feature Learning mechanism.<n>We demonstrate that hallucination signals concentrate in highly sparse feature subsets, achieving significant accuracy improvements on question answering and dialogue tasks.
- Score: 3.9060143123877844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model hallucination represents a critical challenge where outputs deviate from factual accuracy due to distributional biases in training data. While recent investigations establish that specific hidden layers exhibit differences between hallucinatory and factual content, the precise localization of hallucination signals within layers remains unclear, limiting the development of efficient detection methods. We propose a dual-model architecture integrating a Projected Fusion (PF) block for adaptive inter-layer feature weighting and a Differential Feature Learning (DFL) mechanism that identifies discriminative features by computing differences between parallel encoders learning complementary representations from identical inputs. Through systematic experiments across HaluEval's question answering, dialogue, and summarization datasets, we demonstrate that hallucination signals concentrate in highly sparse feature subsets, achieving significant accuracy improvements on question answering and dialogue tasks. Notably, our analysis reveals a hierarchical "funnel pattern" where shallow layers exhibit high feature diversity while deep layers demonstrate concentrated usage, enabling detection performance to be maintained with minimal degradation using only 1\% of feature dimensions. These findings suggest that hallucination signals are more concentrated than previously assumed, offering a pathway toward computationally efficient detection systems that could reduce inference costs while maintaining accuracy.
Related papers
- Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ? [17.099852012707476]
We systematically investigate the impact of PEFT on hallucination detection through a comprehensive empirical study.<n>Experiments show that PEFT consistently strengthens hallucination detection ability.<n>Further analyses indicate that PEFT methods primarily reshapes how uncertainty is encoded and surfaced.
arXiv Detail & Related papers (2026-01-17T21:39:24Z) - FaithSCAN: Model-Driven Single-Pass Hallucination Detection for Faithful Visual Question Answering [14.550872089352943]
FaithSCAN is a lightweight network that detects hallucinations by exploiting rich internal signals of vision-language models.<n>We extend the LLM-as-a-Judge paradigm to VQA hallucination and propose a low-cost strategy to automatically generate model-dependent supervision signals.<n>In-depth analysis shows hallucinations arise from systematic internal state variations in visual perception, cross-modal reasoning, and language decoding.
arXiv Detail & Related papers (2026-01-01T09:19:39Z) - Neural Probe-Based Hallucination Detection for Large Language Models [4.211691393530721]
Large language models excel at text generation and knowledge question-answering tasks.<n>They are prone to generating hallucinated content, severely limiting their application in high-risk domains.<n>We propose a neural network-based framework for token-level hallucination detection.
arXiv Detail & Related papers (2025-12-24T05:10:19Z) - A novel hallucination classification framework [0.0]
This work introduces a novel methodology for the automatic detection of hallucinations generated during large language model (LLM) inference.<n>The proposed approach is based on a systematic taxonomy and controlled reproduction of diverse hallucination types through prompt engineering.
arXiv Detail & Related papers (2025-10-06T09:54:20Z) - ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs [50.18087419133284]
hallucination detection methods leveraging hidden states predominantly focus on static and isolated representations.<n>We introduce a novel metric, the ICR Score, which quantifies the contribution of modules to the hidden states' update.<n>We propose a hallucination detection method, the ICR Probe, which captures the cross-layer evolution of hidden states.
arXiv Detail & Related papers (2025-07-22T11:44:26Z) - Physics-Guided Dual Implicit Neural Representations for Source Separation [70.38762322922211]
We develop a self-supervised machine-learning approach for source separation using a dual implicit neural representation framework.<n>Our method learns directly from the raw data by minimizing a reconstruction-based loss function.<n>Our method offers a versatile framework for addressing source separation problems across diverse domains.
arXiv Detail & Related papers (2025-07-07T17:56:31Z) - Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs [47.18623962083962]
We present a novel approach for detecting hallucinations in large language models.<n>We find that hallucinated responses exhibit smaller deviations from their prompts compared to grounded responses.<n>We propose a model-intrinsic detection method that uses distributional distances as principled hallucination scores.
arXiv Detail & Related papers (2025-06-11T15:59:15Z) - Robust Hallucination Detection in LLMs via Adaptive Token Selection [25.21763722332831]
Hallucinations in large language models (LLMs) pose significant safety concerns that impede their broader deployment.<n>We propose HaMI, a novel approach that enables robust detection of hallucinations through adaptive selection and learning of critical tokens.<n>We achieve this robustness by an innovative formulation of the Hallucination detection task as Multiple Instance (HaMI) learning over token-level representations within a sequence.
arXiv Detail & Related papers (2025-04-10T15:39:10Z) - CHAIR -- Classifier of Hallucination as Improver [1.397828249435483]
We introduce CHAIR (Classifier of Hallucination As ImproveR), a supervised framework for detecting hallucinations by analyzing internal logits from each layer of every token.<n>Our method extracts a compact set of features such as maximum, minimum, mean, standard deviation, and slope-from the token logits across all layers, enabling effective hallucination detection without overfitting.
arXiv Detail & Related papers (2025-01-05T12:15:02Z) - What Matters When Repurposing Diffusion Models for General Dense Perception Tasks? [49.84679952948808]
Recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks.<n>We conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors.<n>Our work culminates in the development of GenPercept, an effective deterministic one-step fine-tuning paradigm tailed for dense visual perception tasks.
arXiv Detail & Related papers (2024-03-10T04:23:24Z) - WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminant
Analysis [21.023001428704085]
We propose a novel feature-space OOD detection score based on class-specific and class-agnostic information.
The efficacy of our method, named WDiscOOD, is verified on the large-scale ImageNet-1k benchmark.
arXiv Detail & Related papers (2023-03-14T00:13:57Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Capturing scattered discriminative information using a deep architecture
in acoustic scene classification [49.86640645460706]
In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
arXiv Detail & Related papers (2020-07-09T08:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.