Disentangling Neuron Representations with Concept Vectors
- URL: http://arxiv.org/abs/2304.09707v1
- Date: Wed, 19 Apr 2023 14:55:31 GMT
- Title: Disentangling Neuron Representations with Concept Vectors
- Authors: Laura O'Mahony, Vincent Andrearczyk, Henning Muller, Mara Graziani
- Abstract summary: The main contribution of this paper is a method to disentangle polysemantic neurons into concept vectors encapsulating distinct features.
Our evaluations show that the concept vectors found encode coherent, human-understandable features.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mechanistic interpretability aims to understand how models store
representations by breaking down neural networks into interpretable units.
However, the occurrence of polysemantic neurons, or neurons that respond to
multiple unrelated features, makes interpreting individual neurons challenging.
This has led to the search for meaningful vectors, known as concept vectors, in
activation space instead of individual neurons. The main contribution of this
paper is a method to disentangle polysemantic neurons into concept vectors
encapsulating distinct features. Our method can search for fine-grained
concepts according to the user's desired level of concept separation. The
analysis shows that polysemantic neurons can be disentangled into directions
consisting of linear combinations of neurons. Our evaluations show that the
concept vectors found encode coherent, human-understandable features.
Related papers
- ConceptLens: from Pixels to Understanding [1.3466710708566176]
ConceptLens is an innovative tool designed to illuminate the intricate workings of deep neural networks (DNNs) by visualizing hidden neuron activations.
By integrating deep learning with symbolic methods, ConceptLens offers users a unique way to understand what triggers neuron activations.
arXiv Detail & Related papers (2024-10-04T20:49:12Z) - Interpreting the Second-Order Effects of Neurons in CLIP [73.54377859089801]
We interpret the function of individual neurons in CLIP by automatically describing them using text.
We present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output.
Our results indicate that a scalable understanding of neurons can be used for model deception and for introducing new model capabilities.
arXiv Detail & Related papers (2024-06-06T17:59:52Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Identifying Interpretable Visual Features in Artificial and Biological
Neural Systems [3.604033202771937]
Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features.
Many neurons exhibit $textitmixed selectivity$, i.e., they represent multiple unrelated features.
We propose an automated method for quantifying visual interpretability and an approach for finding meaningful directions in network activation space.
arXiv Detail & Related papers (2023-10-17T17:41:28Z) - Multimodal Neurons in Pretrained Text-Only Transformers [52.20828443544296]
We identify "multimodal neurons" that convert visual representations into corresponding text.
We show that multimodal neurons operate on specific visual concepts across inputs, and have a systematic causal effect on image captioning.
arXiv Detail & Related papers (2023-08-03T05:27:12Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Cones: Concept Neurons in Diffusion Models for Customized Generation [41.212255848052514]
This paper finds a small cluster of neurons in a diffusion model corresponding to a particular subject.
The concept neurons demonstrate magnetic properties in interpreting and manipulating generation results.
For large-scale applications, the concept neurons are environmentally friendly as we only need to store a sparse cluster of int index instead of dense float32 values.
arXiv Detail & Related papers (2023-03-09T09:16:04Z) - Constraints on the design of neuromorphic circuits set by the properties
of neural population codes [61.15277741147157]
In the brain, information is encoded, transmitted and used to inform behaviour.
Neuromorphic circuits need to encode information in a way compatible to that used by populations of neuron in the brain.
arXiv Detail & Related papers (2022-12-08T15:16:04Z) - NeuroCartography: Scalable Automatic Visual Summarization of Concepts in
Deep Neural Networks [18.62960153659548]
NeuroCartography is an interactive system that summarizes and visualizes concepts learned by neural networks.
It automatically discovers and groups neurons that detect the same concepts.
It describes how such neuron groups interact to form higher-level concepts and the subsequent predictions.
arXiv Detail & Related papers (2021-08-29T22:43:52Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.