Related papers: The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities

The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities

URL: http://arxiv.org/abs/2510.10238v1
Date: Sat, 11 Oct 2025 14:39:09 GMT
Title: The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
Authors: Zixuan Qin, Kunlin Lyu, Qingchen Yu, Yifan Sun, Zhaoxin Fan,
Abstract summary: Large Language Models (LLMs) have become foundational tools in natural language processing.<n>Recent research has found that a small subset of biological neurons in the human brain are crucial for core cognitive functions.
Score: 16.20947034847556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have become foundational tools in natural language processing, powering a wide range of applications and research. Many studies have shown that LLMs share significant similarities with the human brain. Recent neuroscience research has found that a small subset of biological neurons in the human brain are crucial for core cognitive functions, which raises a fundamental question: do LLMs also contain a small subset of critical neurons? In this paper, we investigate this question by proposing a Perturbation-based Causal Identification of Critical Neurons method to systematically locate such critical neurons in LLMs. Our findings reveal three key insights: (1) LLMs contain ultra-sparse critical neuron sets. Disrupting these critical neurons can cause a 72B-parameter model with over 1.1 billion neurons to completely collapse, with perplexity increasing by up to 20 orders of magnitude; (2) These critical neurons are not uniformly distributed, but tend to concentrate in the outer layers, particularly within the MLP down\_proj components; (3) Performance degradation exhibits sharp phase transitions, rather than a gradual decline, when these critical neurons are disrupted. Through comprehensive experiments across diverse model architectures and scales, we provide deeper analysis of these phenomena and their implications for LLM robustness and interpretability. These findings can offer guidance for developing more robust model architectures and improving deployment security in safety-critical applications.

Related papers

Robust Spiking Neural Networks Against Adversarial Attacks [49.08210314590693]
Spiking Neural Networks (SNNs) represent a promising paradigm for energy-efficient neuromorphic computing.<n>In this study, we theoretically demonstrate that threshold-neighboring spiking neurons are the key factors limiting the robustness of directly trained SNNs.<n>We find that these neurons set the upper limits for the maximum potential strength of adversarial attacks and are prone to state-flipping under minor disturbances.
arXiv Detail & Related papers (2026-02-24T05:06:12Z)
Identifying Good and Bad Neurons for Task-Level Controllable LLMs [43.20582224913806]
Large Language Models have demonstrated remarkable capabilities on multiple-choice question answering benchmarks.<n>The complex mechanisms underlying their large-scale neurons remain opaque, posing significant challenges for understanding and steering LLMs.<n>We propose NeuronLLM, a novel task-level LLM understanding framework that adopts the biological principle of functional antagonism for LLM neuron identification.
arXiv Detail & Related papers (2026-01-08T03:24:18Z)
H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs [56.31565301428888]
We identify hallucination-associated neurons (H-Neurons) in large language models (LLMs)<n>In terms of identification, we demonstrate that a remarkably sparse subset of neurons can reliably predict hallucination occurrences.<n>In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors.
arXiv Detail & Related papers (2025-12-01T15:32:14Z)
Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models [17.186414423941482]
Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet their robustness is poorly understood.<n>In this paper, we investigate the structural vulnerabilities of LVLMs to identify any critical neurons whose removal triggers catastrophic collapse.
arXiv Detail & Related papers (2025-11-30T14:52:11Z)
Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models [49.52128963321304]
Large Language Models (LLMs) have shown promising results across various tasks, yet their reasoning capabilities remain a fundamental challenge.<n>This paper comprehensively reviews recent developments in neuro-symbolic approaches for enhancing LLM reasoning.
arXiv Detail & Related papers (2025-08-19T09:27:46Z)
NLP4Neuro: Sequence-to-sequence learning for neural population decoding [0.9086712846902969]
Delineating how animal behavior arises from neural activity is a foundational goal of neuroscience.<n>Transformers, the backbone of modern large language models (LLMs), have become powerful tools for neural decoding from smaller neural populations.
arXiv Detail & Related papers (2025-07-03T03:14:55Z)
Probing Neural Topology of Large Language Models [12.298921317333452]
We introduce graph probing, a method for uncovering the functional connectivity of large language models.<n>By probing models across diverse LLM families and scales, we discover a universal predictability of next-token prediction performance.<n>Strikingly, probing on topology outperforms probing on activation by up to 130.4%.
arXiv Detail & Related papers (2025-06-01T14:57:03Z)
Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models [53.91412558475662]
We use methods similar to those in the field of functional neuroimaging analysis to locate and identify functional networks in large language models (LLMs)<n> Experimental results show that, similar to the human brain, LLMs contain functional networks that frequently recur during operation.<n>Masking key functional networks significantly impairs the model's performance, while retaining just a subset is adequate to maintain effective operation.
arXiv Detail & Related papers (2025-02-13T04:42:39Z)
Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI) Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli. In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs) This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z)
Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [85.0284555835015]
Large language models (LLMs) have revolutionized the field of natural language processing (NLP)<n>Few studies have attempted to explore the internal workings of LLMs in multilingual settings.<n>We classify neurons into four distinct categories based on their responses to a specific input across different languages.
arXiv Detail & Related papers (2024-06-13T16:04:11Z)
Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies [7.21603206617401]
We show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation magnitude to masking. These results suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve.
arXiv Detail & Related papers (2024-06-05T00:31:50Z)
Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words. We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.