Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
- URL: http://arxiv.org/abs/2602.11167v1
- Date: Sun, 18 Jan 2026 22:51:40 GMT
- Title: Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
- Authors: Nathan Mao, Varun Kaushik, Shreya Shivkumar, Parham Sharafoleslami, Kevin Zhu, Sunishchal Dev,
- Abstract summary: Large Language Models (LLMs) often hallucinate, generating nonsensical or false information that can be especially harmful in sensitive fields such as medicine or law.<n>We introduce FalseCite, a curated dataset designed to capture and benchmark hallucinated responses induced by misleading or fabricated citations.<n>Running GPT-4o-mini, Falcon-7B, and Mistral 7-B through FalseCite, we observed a noticeable increase in hallucination activity for false claims with deceptive citations.
- Score: 2.357397994148727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) often hallucinate, generating nonsensical or false information that can be especially harmful in sensitive fields such as medicine or law. To study this phenomenon systematically, we introduce FalseCite, a curated dataset designed to capture and benchmark hallucinated responses induced by misleading or fabricated citations. Running GPT-4o-mini, Falcon-7B, and Mistral 7-B through FalseCite, we observed a noticeable increase in hallucination activity for false claims with deceptive citations, especially in GPT-4o-mini. Using the responses from FalseCite, we can also analyze the internal states of hallucinating models, visualizing and clustering the hidden state vectors. From this analysis, we noticed that the hidden state vectors, regardless of hallucination or non-hallucination, tend to trace out a distinct horn-like shape. Our work underscores FalseCite's potential as a foundation for evaluating and mitigating hallucinations in future LLM research.
Related papers
- Two Causes, Not One: Rethinking Omission and Fabrication Hallucinations in MLLMs [31.601057368065877]
Existing methods, based on the flawed assumption that omission and fabrication hallucinations share a common cause, often reduce omissions only to trigger more fabrications.<n>In this work, we overturn this view by demonstrating that omission hallucinations arise from insufficient confidence when mapping perceived visual features to linguistic expressions.<n>We propose the Visual-Semantic Attention Potential Field, a conceptual framework that reveals how visual evidence to infer the presence or absence of objects.
arXiv Detail & Related papers (2025-08-30T05:47:41Z) - HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination"<n>This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z) - Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer [51.7407540261676]
We investigate a distinct type of hallucination, where a model can consistently answer a question correctly, but a seemingly trivial perturbation causes it to produce a hallucinated response with high certainty.<n>This phenomenon is particularly concerning in high-stakes domains such as medicine or law, where model certainty is often used as a proxy for reliability.<n>We show that CHOKE examples are consistent across prompts, occur in different models and datasets, and are fundamentally distinct from other hallucinations.
arXiv Detail & Related papers (2025-02-18T15:46:31Z) - Valuable Hallucinations: Realizable Non-realistic Propositions [2.451326684641447]
This paper introduces the first formal definition of valuable hallucinations in large language models (LLMs)<n>We focus on the potential value that certain types of hallucinations can offer in specific contexts.<n>We present experiments using the Qwen2.5 model and HalluQA dataset, employing ReAct prompting to control and optimize hallucinations.
arXiv Detail & Related papers (2025-02-16T12:59:11Z) - Can Hallucinations Help? Boosting LLMs for Drug Discovery [8.960425754918974]
Hallucinations in large language models (LLMs) are often viewed as undesirable.<n>We find that hallucinations significantly improve predictive accuracy for some models.<n>We categorize over 18,000 beneficial hallucinations, with structural misdescriptions emerging as the most impactful type.
arXiv Detail & Related papers (2025-01-23T16:45:51Z) - ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models [65.12177400764506]
Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications.<n>Current hallucination detection and mitigation datasets are limited in domains and sizes.<n>This paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset.
arXiv Detail & Related papers (2024-07-05T17:56:38Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations [42.46721214112836]
State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge.
We create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations.
arXiv Detail & Related papers (2024-03-27T00:23:03Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.