Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
- URL: http://arxiv.org/abs/2502.06809v1
- Date: Tue, 04 Feb 2025 03:33:55 GMT
- Title: Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
- Authors: Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, Peizhong Ju, A. B. Siddique,
- Abstract summary: We introduce NeuronLens, a novel range-based interpretation and manipulation framework.
It provides a finer view of neuron activation distributions to localize concept attribution within a neuron.
- Score: 16.460751105639623
- License:
- Abstract: Interpreting and controlling the internal mechanisms of large language models (LLMs) is crucial for improving their trustworthiness and utility. Recent efforts have primarily focused on identifying and manipulating neurons by establishing discrete mappings between neurons and semantic concepts. However, such mappings struggle to handle the inherent polysemanticity in LLMs, where individual neurons encode multiple, distinct concepts. This makes precise control challenging and complicates downstream interventions. Through an in-depth analysis of both encoder and decoder-based LLMs across multiple text classification datasets, we uncover that while individual neurons encode multiple concepts, their activation magnitudes vary across concepts in distinct, Gaussian-like patterns. Building on this insight, we introduce NeuronLens, a novel range-based interpretation and manipulation framework that provides a finer view of neuron activation distributions to localize concept attribution within a neuron. Extensive empirical evaluations demonstrate that NeuronLens significantly reduces unintended interference, while maintaining precise control for manipulation of targeted concepts, outperforming existing methods.
Related papers
- Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.
We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.
We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z) - Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning [2.9539724161670167]
Deep reinforcement learning (DRL) has successfully addressed many complex control problems.
Current DRL interpretability methods largely treat neural networks as black boxes.
We propose a novel concept-based interpretability method that provides fine-grained explanations of DRL models at the neuron level.
arXiv Detail & Related papers (2025-02-02T06:05:49Z) - QuantFormer: Learning to Quantize for Neural Activity Forecasting in Mouse Visual Cortex [26.499583552980248]
QuantFormer is a transformer-based model specifically designed for forecasting neural activity from two-photon calcium imaging data.
QuantFormer sets a new benchmark in forecasting mouse visual cortex activity.
It demonstrates robust performance and generalization across various stimuli and individuals.
arXiv Detail & Related papers (2024-12-10T07:44:35Z) - Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
It has long been known in both neuroscience and AI that ''binding'' between neurons leads to a form of competitive learning.
We introduce Artificial rethinking together with arbitrary connectivity designs such as fully connected convolutional, or attentive mechanisms.
We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, uncertainty, and reasoning.
arXiv Detail & Related papers (2024-10-17T17:47:54Z) - ConceptLens: from Pixels to Understanding [1.3466710708566176]
ConceptLens is an innovative tool designed to illuminate the intricate workings of deep neural networks (DNNs) by visualizing hidden neuron activations.
By integrating deep learning with symbolic methods, ConceptLens offers users a unique way to understand what triggers neuron activations.
arXiv Detail & Related papers (2024-10-04T20:49:12Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Automated Natural Language Explanation of Deep Visual Neurons with Large
Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models.
Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z) - Cones: Concept Neurons in Diffusion Models for Customized Generation [41.212255848052514]
This paper finds a small cluster of neurons in a diffusion model corresponding to a particular subject.
The concept neurons demonstrate magnetic properties in interpreting and manipulating generation results.
For large-scale applications, the concept neurons are environmentally friendly as we only need to store a sparse cluster of int index instead of dense float32 values.
arXiv Detail & Related papers (2023-03-09T09:16:04Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.