Hallucination Detection in LLMs Using Spectral Features of Attention Maps
- URL: http://arxiv.org/abs/2502.17598v1
- Date: Mon, 24 Feb 2025 19:30:24 GMT
- Title: Hallucination Detection in LLMs Using Spectral Features of Attention Maps
- Authors: Jakub Binkowski, Denis Janiak, Albert Sawczyn, Bogdan Gabrys, Tomasz Kajdanowicz,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable performance across various tasks but remain prone to hallucinations.<n>Recent methods leverage attention map properties to this end, though their effectiveness remains limited.<n>We propose the $textLapEigvals$ method, which uses the top-$k$ eigenvalues of the Laplacian matrix derived from the attention maps as an input to hallucination detection probes.
- Score: 8.820670807424174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various tasks but remain prone to hallucinations. Detecting hallucinations is essential for safety-critical applications, and recent methods leverage attention map properties to this end, though their effectiveness remains limited. In this work, we investigate the spectral features of attention maps by interpreting them as adjacency matrices of graph structures. We propose the $\text{LapEigvals}$ method, which utilises the top-$k$ eigenvalues of the Laplacian matrix derived from the attention maps as an input to hallucination detection probes. Empirical evaluations demonstrate that our approach achieves state-of-the-art hallucination detection performance among attention-based methods. Extensive ablation studies further highlight the robustness and generalisation of $\text{LapEigvals}$, paving the way for future advancements in the hallucination detection domain.
Related papers
- Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs [6.645440928271175]
Visionivate-Language Models (VLMs) have garnered increasing attention in the AI community due to their promising practical applications.<n>Recent studies attribute these hallucinations to VLMs' over-reliance on linguistic priors and insufficient visual feature integration.<n>We propose an adversarial parametric editing framework for Hallucination mitigation inVLMs, which follows an textbfActtextbfLocate-textbfEdit textbfAdversarially paradigm.
arXiv Detail & Related papers (2025-12-26T11:56:45Z) - The Map of Misbelief: Tracing Intrinsic and Extrinsic Hallucinations Through Attention Patterns [1.0896567381206717]
Large Language Models (LLMs) are increasingly deployed in safety-critical domains, yet remain susceptible to hallucinations.<n>We introduce a principled evaluation framework that differentiates between extrinsic and intrinsic hallucination categories.<n>We propose novel attention aggregation strategies that improve both interpretability and hallucination detection performance.
arXiv Detail & Related papers (2025-11-13T22:42:18Z) - PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning [87.35309934860938]
hallucinations in large language models (MLLMs) are strongly associated with insufficient attention allocated to visual tokens.<n>We propose textbfPruneHal, a training-free, simple yet effective method that leverages adaptive KV cache pruning to enhance the model's focus on critical visual information.
arXiv Detail & Related papers (2025-10-22T02:41:07Z) - Neural Message-Passing on Attention Graphs for Hallucination Detection [32.29963721910821]
CHARM casts hallucination detection as a graph learning task and tackles it by applying GNNs over the above attributed graphs.<n>We show that CHARM provably subsumes prior attention-based traces and, experimentally, it consistently outperforms other approaches across diverse benchmarks.
arXiv Detail & Related papers (2025-09-29T13:37:12Z) - What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models? [95.46087552542998]
This paper introduces the Hallucination searching-based Object Probing Evaluation benchmark.<n>It aims to generate the most misleading distractors that can trigger hallucination in Large Vision-Language Models.<n> Experimental results show that HOPE leads to a precision drop of at least 9% and up to 23% across various state-of-the-art LVLMs.
arXiv Detail & Related papers (2025-08-03T03:11:48Z) - ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs [50.18087419133284]
hallucination detection methods leveraging hidden states predominantly focus on static and isolated representations.<n>We introduce a novel metric, the ICR Score, which quantifies the contribution of modules to the hidden states' update.<n>We propose a hallucination detection method, the ICR Probe, which captures the cross-layer evolution of hidden states.
arXiv Detail & Related papers (2025-07-22T11:44:26Z) - Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations.
Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones.
We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z) - Hallucination Detection in LLMs via Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.
We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z) - Robust Hallucination Detection in LLMs via Adaptive Token Selection [25.21763722332831]
Hallucinations in large language models (LLMs) pose significant safety concerns that impede their broader deployment.
We propose HaMI, a novel approach that enables robust detection of hallucinations through adaptive selection and learning of critical tokens.
We achieve this robustness by an innovative formulation of the Hallucination detection task as Multiple Instance (HaMI) learning over token-level representations within a sequence.
arXiv Detail & Related papers (2025-04-10T15:39:10Z) - Hallucinated Span Detection with Multi-View Attention Features [8.747292152322578]
This study addresses the problem of hallucinated span detection in the outputs of large language models.<n>It has received less attention than output-level hallucination detection despite its practical importance.
arXiv Detail & Related papers (2025-04-06T03:00:58Z) - CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base [29.477973983931083]
We propose CutPaste&Find, a lightweight and training-free framework for detecting hallucinations in LVLM-generated outputs.<n>At the core of our framework is a Visual-aid Knowledge Base that encodes rich entity-attribute relationships and associated image representations.<n>We introduce a scaling factor to refine similarity scores, mitigating the issue of suboptimal alignment values even for ground-truth image-text pairs.
arXiv Detail & Related papers (2025-02-18T07:06:36Z) - Can Your Uncertainty Scores Detect Hallucinated Entity? [14.432545893757677]
We propose a new data set, HalluEntity, which annotates hallucination at the entity level.<n>Based on the dataset, we evaluate uncertainty-based hallucination detection approaches across 17 modern LLMs.<n>Our experimental results show that uncertainty estimation approaches focusing on individual token probabilities tend to over-predict hallucinations.
arXiv Detail & Related papers (2025-02-17T16:01:41Z) - CHAIR -- Classifier of Hallucination as Improver [1.397828249435483]
We introduce CHAIR (Classifier of Hallucination As ImproveR), a supervised framework for detecting hallucinations by analyzing internal logits from each layer of every token.<n>Our method extracts a compact set of features such as maximum, minimum, mean, standard deviation, and slope-from the token logits across all layers, enabling effective hallucination detection without overfitting.
arXiv Detail & Related papers (2025-01-05T12:15:02Z) - ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models [65.12177400764506]
Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications.<n>Current hallucination detection and mitigation datasets are limited in domains and sizes.<n>This paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset.
arXiv Detail & Related papers (2024-07-05T17:56:38Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [40.930238150365795]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.<n>We generate a small-size hallucination annotation dataset by proprietary models.<n>Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z) - Detecting Hallucinated Content in Conditional Neural Sequence Generation [165.68948078624499]
We propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input)
We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data.
arXiv Detail & Related papers (2020-11-05T00:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.