H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
- URL: http://arxiv.org/abs/2512.01797v2
- Date: Tue, 02 Dec 2025 07:08:39 GMT
- Title: H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
- Authors: Cheng Gao, Huimin Chen, Chaojun Xiao, Zhiyi Chen, Zhiyuan Liu, Maosong Sun,
- Abstract summary: We identify hallucination-associated neurons (H-Neurons) in large language models (LLMs)<n>In terms of identification, we demonstrate that a remarkably sparse subset of neurons can reliably predict hallucination occurrences.<n>In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors.
- Score: 56.31565301428888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability. While prior work has examined hallucinations from macroscopic perspectives such as training data and objectives, the underlying neuron-level mechanisms remain largely unexplored. In this paper, we conduct a systematic investigation into hallucination-associated neurons (H-Neurons) in LLMs from three perspectives: identification, behavioral impact, and origins. Regarding their identification, we demonstrate that a remarkably sparse subset of neurons (less than $0.1\%$ of total neurons) can reliably predict hallucination occurrences, with strong generalization across diverse scenarios. In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors. Concerning their origins, we trace these neurons back to the pre-trained base models and find that these neurons remain predictive for hallucination detection, indicating they emerge during pre-training. Our findings bridge macroscopic behavioral patterns with microscopic neural mechanisms, offering insights for developing more reliable LLMs.
Related papers
- Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering [60.23509717784518]
Existing mitigation methods predominantly focus on output-level adjustments, leaving internal mechanisms that give rise to hallucinations largely unexplored.<n>We propose Contrastive Neuron Steering ( CNS), which identifies image-specific neurons via contrastive analysis between clean and noisy inputs.<n> CNS selectively amplifies informative neurons while suppressing perturbation-induced activations, producing more robust and semantically grounded visual representations.
arXiv Detail & Related papers (2026-01-31T09:21:04Z) - Identifying Good and Bad Neurons for Task-Level Controllable LLMs [43.20582224913806]
Large Language Models have demonstrated remarkable capabilities on multiple-choice question answering benchmarks.<n>The complex mechanisms underlying their large-scale neurons remain opaque, posing significant challenges for understanding and steering LLMs.<n>We propose NeuronLLM, a novel task-level LLM understanding framework that adopts the biological principle of functional antagonism for LLM neuron identification.
arXiv Detail & Related papers (2026-01-08T03:24:18Z) - Disentangling Neurodegeneration with Brain Age Gap Prediction Models: A Graph Signal Processing Perspective [89.99666725996975]
The brain age gap prediction (BAGP) models estimate the difference between a person's predicted brain age from data and their chronological age.<n>This tutorial article provides an overview of BAGP and introduces a principled framework for this application based on recent advancements in graph signal processing (GSP)<n>VNNs offer strong theoretical grounding and operational interpretability, enabling robust estimation of brain age gap predictions.
arXiv Detail & Related papers (2025-10-14T17:44:45Z) - The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities [16.20947034847556]
Large Language Models (LLMs) have become foundational tools in natural language processing.<n>Recent research has found that a small subset of biological neurons in the human brain are crucial for core cognitive functions.
arXiv Detail & Related papers (2025-10-11T14:39:09Z) - NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models [63.592664795493725]
NOBLE is a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection.<n>It predicts distributions of neural dynamics accounting for the intrinsic experimental variability.<n>NOBLE is the first scaled-up deep learning framework that validates its generalization with real experimental data.
arXiv Detail & Related papers (2025-06-05T01:01:18Z) - Confidence Regulation Neurons in Language Models [91.90337752432075]
This study investigates the mechanisms by which large language models represent and regulate uncertainty in next-token predictions.
Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits.
token frequency neurons, which we describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution.
arXiv Detail & Related papers (2024-06-24T01:31:03Z) - Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies [7.21603206617401]
We show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation magnitude to masking.
These results suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve.
arXiv Detail & Related papers (2024-06-05T00:31:50Z) - Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data [3.46029409929709]
State-of-the-art systems neuroscience experiments yield large-scale multimodal data, and these data sets require new tools for analysis.
Inspired by the success of large pretrained models in vision and language domains, we reframe the analysis of large-scale, cellular-resolution neuronal spiking data into an autoregressive generation problem.
We first trained Neuroformer on simulated datasets, and found that it both accurately predicted intrinsically simulated neuronal circuit activity, and also inferred the underlying neural circuit connectivity, including direction.
arXiv Detail & Related papers (2023-10-31T20:17:32Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.