The Map of Misbelief: Tracing Intrinsic and Extrinsic Hallucinations Through Attention Patterns
- URL: http://arxiv.org/abs/2511.10837v1
- Date: Thu, 13 Nov 2025 22:42:18 GMT
- Title: The Map of Misbelief: Tracing Intrinsic and Extrinsic Hallucinations Through Attention Patterns
- Authors: Elyes Hajji, Aymen Bouguerra, Fabio Arnez,
- Abstract summary: Large Language Models (LLMs) are increasingly deployed in safety-critical domains, yet remain susceptible to hallucinations.<n>We introduce a principled evaluation framework that differentiates between extrinsic and intrinsic hallucination categories.<n>We propose novel attention aggregation strategies that improve both interpretability and hallucination detection performance.
- Score: 1.0896567381206717
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large Language Models (LLMs) are increasingly deployed in safety-critical domains, yet remain susceptible to hallucinations. While prior works have proposed confidence representation methods for hallucination detection, most of these approaches rely on computationally expensive sampling strategies and often disregard the distinction between hallucination types. In this work, we introduce a principled evaluation framework that differentiates between extrinsic and intrinsic hallucination categories and evaluates detection performance across a suite of curated benchmarks. In addition, we leverage a recent attention-based uncertainty quantification algorithm and propose novel attention aggregation strategies that improve both interpretability and hallucination detection performance. Our experimental findings reveal that sampling-based methods like Semantic Entropy are effective for detecting extrinsic hallucinations but generally fail on intrinsic ones. In contrast, our method, which aggregates attention over input tokens, is better suited for intrinsic hallucinations. These insights provide new directions for aligning detection strategies with the nature of hallucination and highlight attention as a rich signal for quantifying model uncertainty.
Related papers
- Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference Optimization [78.94590726578014]
multimodal reasoning models (MLRMs) remain prone to hallucinations, and effective solutions are still underexplored.<n>We propose C3PO, a training-based mitigation framework comprising textbfCompression and textbfPreference textbfOptimization.
arXiv Detail & Related papers (2026-02-03T11:00:55Z) - HACK: Hallucinations Along Certainty and Knowledge Axes [66.66625343090743]
We propose a framework for categorizing hallucinations along two axes: knowledge and certainty.<n>We identify a particularly concerning subset of hallucinations where models hallucinate with certainty despite having the correct knowledge internally.
arXiv Detail & Related papers (2025-10-28T09:34:31Z) - PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning [87.35309934860938]
hallucinations in large language models (MLLMs) are strongly associated with insufficient attention allocated to visual tokens.<n>We propose textbfPruneHal, a training-free, simple yet effective method that leverages adaptive KV cache pruning to enhance the model's focus on critical visual information.
arXiv Detail & Related papers (2025-10-22T02:41:07Z) - Revisiting Hallucination Detection with Effective Rank-based Uncertainty [10.775061161282053]
We propose a simple yet powerful method that quantifies uncertainty by measuring the effective rank of hidden states.<n>Grounded in the spectral analysis of representations, our approach provides interpretable insights into the model's internal reasoning process.<n>Our method effectively detects hallucinations and generalizes robustly across various scenarios.
arXiv Detail & Related papers (2025-10-09T16:12:12Z) - Semantic Energy: Detecting LLM Hallucination Beyond Entropy [106.92072182161712]
Large Language Models (LLMs) are being increasingly deployed in real-world applications, but they remain susceptible to hallucinations.<n>Uncertainty estimation is a feasible approach to detect such hallucinations.<n>We introduce Semantic Energy, a novel uncertainty estimation framework.
arXiv Detail & Related papers (2025-08-20T07:33:50Z) - ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs [50.18087419133284]
hallucination detection methods leveraging hidden states predominantly focus on static and isolated representations.<n>We introduce a novel metric, the ICR Score, which quantifies the contribution of modules to the hidden states' update.<n>We propose a hallucination detection method, the ICR Probe, which captures the cross-layer evolution of hidden states.
arXiv Detail & Related papers (2025-07-22T11:44:26Z) - Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation [9.540386616651295]
Chain-of-Thought (CoT) prompting can mitigate hallucinations by encouraging step-by-step reasoning.<n>Our study highlights an overlooked trade-off in the use of reasoning.
arXiv Detail & Related papers (2025-06-20T15:49:37Z) - Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs [47.18623962083962]
We present a novel approach for detecting hallucinations in large language models.<n>We find that hallucinated responses exhibit smaller deviations from their prompts compared to grounded responses.<n>We propose a model-intrinsic detection method that uses distributional distances as principled hallucination scores.
arXiv Detail & Related papers (2025-06-11T15:59:15Z) - Robust Hallucination Detection in LLMs via Adaptive Token Selection [35.06045656558144]
Hallucinations in large language models (LLMs) pose significant safety concerns that impede their broader deployment.<n>We propose HaMI, a novel approach that enables robust detection of hallucinations through adaptive selection and learning of critical tokens.<n>We achieve this robustness by an innovative formulation of the Hallucination detection task as Multiple Instance (HaMI) learning over token-level representations within a sequence.
arXiv Detail & Related papers (2025-04-10T15:39:10Z) - Hallucination Detection in LLMs Using Spectral Features of Attention Maps [7.034766253049102]
Large Language Models (LLMs) have demonstrated remarkable performance across various tasks but remain prone to hallucinations.<n>Recent methods leverage attention map properties to this end, though their effectiveness remains limited.<n>We propose the $textLapEigvals$ method, which uses the top-$k$ eigenvalues of the Laplacian matrix derived from the attention maps as an input to hallucination detection probes.
arXiv Detail & Related papers (2025-02-24T19:30:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.