Identifying Good and Bad Neurons for Task-Level Controllable LLMs
- URL: http://arxiv.org/abs/2601.04548v1
- Date: Thu, 08 Jan 2026 03:24:18 GMT
- Title: Identifying Good and Bad Neurons for Task-Level Controllable LLMs
- Authors: Wenjie Li, Guansong Pang, Hezhe Qiao, Debin Gao, David Lo,
- Abstract summary: Large Language Models have demonstrated remarkable capabilities on multiple-choice question answering benchmarks.<n>The complex mechanisms underlying their large-scale neurons remain opaque, posing significant challenges for understanding and steering LLMs.<n>We propose NeuronLLM, a novel task-level LLM understanding framework that adopts the biological principle of functional antagonism for LLM neuron identification.
- Score: 43.20582224913806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models have demonstrated remarkable capabilities on multiple-choice question answering benchmarks, but the complex mechanisms underlying their large-scale neurons remain opaque, posing significant challenges for understanding and steering LLMs. While recent studies made progress on identifying responsible neurons for certain abilities, these ability-specific methods are infeasible for task-focused scenarios requiring coordinated use of multiple abilities. Moreover, these approaches focus only on supportive neurons that correlate positively with task completion, while neglecting neurons with other roles-such as inhibitive roles-and misled neuron attribution due to fortuitous behaviors in LLMs (i.e., correctly answer the questions by chance rather than genuine understanding). To address these challenges, we propose NeuronLLM, a novel task-level LLM understanding framework that adopts the biological principle of functional antagonism for LLM neuron identification. The key insight is that task performance is jointly determined by neurons with two opposing roles: good neurons that facilitate task completion and bad neurons that inhibit it. NeuronLLM achieves a holistic modeling of neurons via contrastive learning of good and bad neurons, while leveraging augmented question sets to mitigate the fortuitous behaviors in LLMs. Comprehensive experiments on LLMs of different sizes and families show the superiority of NeuronLLM over existing methods in four NLP tasks, providing new insights into LLM functional organization.
Related papers
- H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs [56.31565301428888]
We identify hallucination-associated neurons (H-Neurons) in large language models (LLMs)<n>In terms of identification, we demonstrate that a remarkably sparse subset of neurons can reliably predict hallucination occurrences.<n>In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors.
arXiv Detail & Related papers (2025-12-01T15:32:14Z) - The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities [16.20947034847556]
Large Language Models (LLMs) have become foundational tools in natural language processing.<n>Recent research has found that a small subset of biological neurons in the human brain are crucial for core cognitive functions.
arXiv Detail & Related papers (2025-10-11T14:39:09Z) - NLP4Neuro: Sequence-to-sequence learning for neural population decoding [0.9086712846902969]
Delineating how animal behavior arises from neural activity is a foundational goal of neuroscience.<n>Transformers, the backbone of modern large language models (LLMs), have become powerful tools for neural decoding from smaller neural populations.
arXiv Detail & Related papers (2025-07-03T03:14:55Z) - Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models [53.91412558475662]
We use methods similar to those in the field of functional neuroimaging analysis to locate and identify functional networks in large language models (LLMs)<n> Experimental results show that, similar to the human brain, LLMs contain functional networks that frequently recur during operation.<n>Masking key functional networks significantly impairs the model's performance, while retaining just a subset is adequate to maintain effective operation.
arXiv Detail & Related papers (2025-02-13T04:42:39Z) - Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI)
Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli.
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs)
This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z) - Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons [45.04661608619081]
We detect task-sensitive neurons in large language models (LLMs) via gradient attribution on task-specific data.<n>We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks.<n>We propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning.
arXiv Detail & Related papers (2024-07-09T01:27:35Z) - Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [85.0284555835015]
Large language models (LLMs) have revolutionized the field of natural language processing (NLP)<n>Few studies have attempted to explore the internal workings of LLMs in multilingual settings.<n>We classify neurons into four distinct categories based on their responses to a specific input across different languages.
arXiv Detail & Related papers (2024-06-13T16:04:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.