Related papers: Disentangling Neuron Representations with Concept Vectors

Disentangling Neuron Representations with Concept Vectors

URL: http://arxiv.org/abs/2304.09707v1
Date: Wed, 19 Apr 2023 14:55:31 GMT
Title: Disentangling Neuron Representations with Concept Vectors
Authors: Laura O'Mahony, Vincent Andrearczyk, Henning Muller, Mara Graziani
Abstract summary: The main contribution of this paper is a method to disentangle polysemantic neurons into concept vectors encapsulating distinct features. Our evaluations show that the concept vectors found encode coherent, human-understandable features.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mechanistic interpretability aims to understand how models store representations by breaking down neural networks into interpretable units. However, the occurrence of polysemantic neurons, or neurons that respond to multiple unrelated features, makes interpreting individual neurons challenging. This has led to the search for meaningful vectors, known as concept vectors, in activation space instead of individual neurons. The main contribution of this paper is a method to disentangle polysemantic neurons into concept vectors encapsulating distinct features. Our method can search for fine-grained concepts according to the user's desired level of concept separation. The analysis shows that polysemantic neurons can be disentangled into directions consisting of linear combinations of neurons. Our evaluations show that the concept vectors found encode coherent, human-understandable features.

Related papers

Understanding Gated Neurons in Transformers from Their Input-Output Functionality [48.91500104957796]
We look at the cosine similarity between input and output weights of a neuron.<n>We find that enrichment neurons dominate in early-middle layers whereas later layers tend more towards depletion.
arXiv Detail & Related papers (2025-05-23T14:14:17Z)
Explaining Neural Networks with Reasons [0.0]
Our method computes a vector for each neuron, called its reasons vector.<n>We then can compute how strongly this reasons vector speaks for various propositions, e.g., the proposition that the input image depicts digit 2 or that the input prompt has a negative sentiment.
arXiv Detail & Related papers (2025-05-20T14:32:03Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
ConceptLens: from Pixels to Understanding [1.3466710708566176]
ConceptLens is an innovative tool designed to illuminate the intricate workings of deep neural networks (DNNs) by visualizing hidden neuron activations. By integrating deep learning with symbolic methods, ConceptLens offers users a unique way to understand what triggers neuron activations.
arXiv Detail & Related papers (2024-10-04T20:49:12Z)
Interpreting the Second-Order Effects of Neurons in CLIP [73.54377859089801]
We interpret the function of individual neurons in CLIP by automatically describing them using text. We present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output. Our results indicate that a scalable understanding of neurons can be used for model deception and for introducing new model capabilities.
arXiv Detail & Related papers (2024-06-06T17:59:52Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Identifying Interpretable Visual Features in Artificial and Biological Neural Systems [3.604033202771937]
Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. Many neurons exhibit $textitmixed selectivity$, i.e., they represent multiple unrelated features. We propose an automated method for quantifying visual interpretability and an approach for finding meaningful directions in network activation space.
arXiv Detail & Related papers (2023-10-17T17:41:28Z)
Multimodal Neurons in Pretrained Text-Only Transformers [52.20828443544296]
We identify "multimodal neurons" that convert visual representations into corresponding text. We show that multimodal neurons operate on specific visual concepts across inputs, and have a systematic causal effect on image captioning.
arXiv Detail & Related papers (2023-08-03T05:27:12Z)
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z)
Cones: Concept Neurons in Diffusion Models for Customized Generation [41.212255848052514]
This paper finds a small cluster of neurons in a diffusion model corresponding to a particular subject. The concept neurons demonstrate magnetic properties in interpreting and manipulating generation results. For large-scale applications, the concept neurons are environmentally friendly as we only need to store a sparse cluster of int index instead of dense float32 values.
arXiv Detail & Related papers (2023-03-09T09:16:04Z)
Constraints on the design of neuromorphic circuits set by the properties of neural population codes [61.15277741147157]
In the brain, information is encoded, transmitted and used to inform behaviour. Neuromorphic circuits need to encode information in a way compatible to that used by populations of neuron in the brain.
arXiv Detail & Related papers (2022-12-08T15:16:04Z)
NeuroCartography: Scalable Automatic Visual Summarization of Concepts in Deep Neural Networks [18.62960153659548]
NeuroCartography is an interactive system that summarizes and visualizes concepts learned by neural networks. It automatically discovers and groups neurons that detect the same concepts. It describes how such neuron groups interact to form higher-level concepts and the subsequent predictions.
arXiv Detail & Related papers (2021-08-29T22:43:52Z)
Neuron-based explanations of neural networks sacrifice completeness and interpretability [67.53271920386851]
We show that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability. We show the most important principal components provide more complete and interpretable explanations than the most important neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings.
arXiv Detail & Related papers (2020-11-05T21:26:03Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.