Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
- URL: http://arxiv.org/abs/2502.06809v2
- Date: Wed, 21 May 2025 03:16:45 GMT
- Title: Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
- Authors: Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, Peizhong Ju, A. B. Siddique,
- Abstract summary: We show that even highly salient neurons consistently exhibit polysemantic behavior.<n>This observation motivates a shift from neuron attribution to range-based interpretation.<n>We introduce NeuronLens, a novel range-based interpretation and manipulation framework.
- Score: 16.460751105639623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpreting the internal mechanisms of large language models (LLMs) is crucial for improving their trustworthiness and utility. Prior work has primarily focused on mapping individual neurons to discrete semantic concepts. However, such mappings struggle to handle the inherent polysemanticity in LLMs, where individual neurons encode multiple, distinct concepts. Through a comprehensive analysis of both encoder and decoder-based LLMs across diverse datasets, we observe that even highly salient neurons, identified via various attribution techniques for specific semantic concepts, consistently exhibit polysemantic behavior. Importantly, activation magnitudes for fine-grained concepts follow distinct, often Gaussian-like distributions with minimal overlap. This observation motivates a shift from neuron attribution to range-based interpretation. We hypothesize that interpreting and manipulating neuron activation ranges would enable more precise interpretability and targeted interventions in LLMs. To validate our hypothesis, we introduce NeuronLens, a novel range-based interpretation and manipulation framework that provides a finer view of neuron activation distributions to localize concept attribution within a neuron. Extensive empirical evaluations demonstrate that NeuronLens significantly reduces unintended interference, while maintaining precise manipulation of targeted concepts, outperforming neuron attribution.
Related papers
- Do LLMs and VLMs Share Neurons for Inference? Evidence and Mechanisms of Cross-Modal Transfer [65.72553715508691]
We show that large vision-language models (LVLMs) lag behind strong text-only large language models (LLMs) on tasks that require multi-step inference and compositional decision-making.<n>We propose Shared Neuron Low-Rank Fusion (SNRF), a parameter-efficient framework that transfers mature inference circuitry from LLMs to LVLMs.<n>Our results demonstrate that shared neurons form an interpretable bridge between LLMs and LVLMs, enabling low-cost transfer of inference ability into multimodal models.
arXiv Detail & Related papers (2026-02-22T06:04:05Z) - Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering [60.23509717784518]
Existing mitigation methods predominantly focus on output-level adjustments, leaving internal mechanisms that give rise to hallucinations largely unexplored.<n>We propose Contrastive Neuron Steering ( CNS), which identifies image-specific neurons via contrastive analysis between clean and noisy inputs.<n> CNS selectively amplifies informative neurons while suppressing perturbation-induced activations, producing more robust and semantically grounded visual representations.
arXiv Detail & Related papers (2026-01-31T09:21:04Z) - NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models [24.550940304055562]
NeuronScope is a multi-agent framework that reformulates neuron interpretation as an iterative, activation-guided process.<n>We show that NeuronScope uncovers hidden polysemanticity and produces explanations with significantly higher activation correlation compared to single-pass baselines.
arXiv Detail & Related papers (2026-01-07T07:50:47Z) - H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs [56.31565301428888]
We identify hallucination-associated neurons (H-Neurons) in large language models (LLMs)<n>In terms of identification, we demonstrate that a remarkably sparse subset of neurons can reliably predict hallucination occurrences.<n>In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors.
arXiv Detail & Related papers (2025-12-01T15:32:14Z) - Neuronal Group Communication for Efficient Neural representation [85.36421257648294]
This paper addresses the question of how to build large neural systems that learn efficient, modular, and interpretable representations.<n>We propose Neuronal Group Communication (NGC), a theory-driven framework that reimagines a neural network as a dynamical system of interacting neuronal groups.<n>NGC treats weights as transient interactions between embedding-like neuronal states, with neural computation unfolding through iterative communication among groups of neurons.
arXiv Detail & Related papers (2025-10-19T14:23:35Z) - NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models [68.89389652724378]
NOBLE is a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection.<n>It predicts distributions of neural dynamics accounting for the intrinsic experimental variability.<n>NOBLE is the first scaled-up deep learning framework validated on real experimental data.
arXiv Detail & Related papers (2025-06-05T01:01:18Z) - Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z) - Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z) - Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning [2.9539724161670167]
Deep reinforcement learning (DRL) has successfully addressed many complex control problems.<n>Current DRL interpretability methods largely treat neural networks as black boxes.<n>We propose a novel concept-based interpretability method that provides fine-grained explanations of DRL models at the neuron level.
arXiv Detail & Related papers (2025-02-02T06:05:49Z) - QuantFormer: Learning to Quantize for Neural Activity Forecasting in Mouse Visual Cortex [26.499583552980248]
QuantFormer is a transformer-based model specifically designed for forecasting neural activity from two-photon calcium imaging data.<n> QuantFormer sets a new benchmark in forecasting mouse visual cortex activity.<n>It demonstrates robust performance and generalization across various stimuli and individuals.
arXiv Detail & Related papers (2024-12-10T07:44:35Z) - Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
We introduce Artificial Kuramotoy Neurons (AKOrN) as a dynamical alternative to threshold units.
We show that this idea provides performance improvements across a wide spectrum of tasks.
We believe that these empirical results show the importance of our assumptions at the most basic neuronal level of neural representation.
arXiv Detail & Related papers (2024-10-17T17:47:54Z) - Range, not Independence, Drives Modularity in Biologically Inspired Representations [52.48094670415497]
We develop a theory of when biologically inspired networks modularise their representation of source variables (sources)<n>We derive necessary and sufficient conditions on a sample of sources that determine whether the neurons in an optimal linear autoencoder modularise.<n>Our theory applies to any dataset, extending far beyond the case of statistical independence studied in previous work.
arXiv Detail & Related papers (2024-10-08T17:41:37Z) - ConceptLens: from Pixels to Understanding [1.3466710708566176]
ConceptLens is an innovative tool designed to illuminate the intricate workings of deep neural networks (DNNs) by visualizing hidden neuron activations.
By integrating deep learning with symbolic methods, ConceptLens offers users a unique way to understand what triggers neuron activations.
arXiv Detail & Related papers (2024-10-04T20:49:12Z) - Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis [19.472889262384818]
We find arithmetic ability resides within a limited number of attention heads, with each head specializing in distinct operations.
We introduce the Comparative Neuron Analysis (CNA) method, which identifies an internal logic chain consisting of four distinct stages from input to prediction.
arXiv Detail & Related papers (2024-09-21T13:46:54Z) - Interpreting the Second-Order Effects of Neurons in CLIP [73.54377859089801]
We interpret the function of individual neurons in CLIP by automatically describing them using text.<n>We present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output.<n>Our results indicate that an automated interpretation of neurons can be used for model deception and for introducing new model capabilities.
arXiv Detail & Related papers (2024-06-06T17:59:52Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Learning from Emergence: A Study on Proactively Inhibiting the Monosemantic Neurons of Artificial Neural Networks [10.390475063385756]
We propose a new metric to measure the monosemanticity of neurons with the guarantee of efficiency for online computation.
We validate our conjecture that monosemanticity brings about performance change at different model scales.
arXiv Detail & Related papers (2023-12-17T14:42:46Z) - Automated Natural Language Explanation of Deep Visual Neurons with Large
Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models.
Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z) - Cones: Concept Neurons in Diffusion Models for Customized Generation [41.212255848052514]
This paper finds a small cluster of neurons in a diffusion model corresponding to a particular subject.
The concept neurons demonstrate magnetic properties in interpreting and manipulating generation results.
For large-scale applications, the concept neurons are environmentally friendly as we only need to store a sparse cluster of int index instead of dense float32 values.
arXiv Detail & Related papers (2023-03-09T09:16:04Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.