Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision
- URL: http://arxiv.org/abs/2312.17285v2
- Date: Tue, 5 Mar 2024 23:40:45 GMT
- Title: Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision
- Authors: Wonjoon Chang, Dahee Kwon, Jaesik Choi
- Abstract summary: We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts.
It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
- Score: 25.449397570387802
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Understanding intermediate representations of the concepts learned by deep
learning classifiers is indispensable for interpreting general model behaviors.
Existing approaches to reveal learned concepts often rely on human supervision,
such as pre-defined concept sets or segmentation processes. In this paper, we
propose a novel unsupervised method for discovering distributed representations
of concepts by selecting a principal subset of neurons. Our empirical findings
demonstrate that instances with similar neuron activation states tend to share
coherent concepts. Based on the observations, the proposed method selects
principal neurons that construct an interpretable region, namely a Relaxed
Decision Region (RDR), encompassing instances with coherent concepts in the
feature space. It can be utilized to identify unlabeled subclasses within data
and to detect the causes of misclassifications. Furthermore, the applicability
of our method across various layers discloses distinct distributed
representations over the layers, which provides deeper insights into the
internal mechanisms of the deep learning model.
Related papers
- CoLiDR: Concept Learning using Aggregated Disentangled Representations [29.932706137805713]
Interpretability of Deep Neural Networks using concept-based models offers a promising way to explain model behavior through human-understandable concepts.
A parallel line of research focuses on disentangling the data distribution into its underlying generative factors, in turn explaining the data generation process.
While both directions have received extensive attention, little work has been done on explaining concepts in terms of generative factors to unify mathematically disentangled representations and human-understandable concepts.
arXiv Detail & Related papers (2024-07-27T16:55:14Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Concept backpropagation: An Explainable AI approach for visualising
learned concepts in neural network models [0.0]
We present an extension to the method of concept detection, named emphconcept backpropagation, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model.
arXiv Detail & Related papers (2023-07-24T08:21:13Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Uncovering Unique Concept Vectors through Latent Space Decomposition [0.0]
Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates.
We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training.
Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
arXiv Detail & Related papers (2023-07-13T17:21:54Z) - A Unified Concept-Based System for Local, Global, and Misclassification
Explanations [13.321794212377949]
We present a unified concept-based system for unsupervised learning of both local and global concepts.
Our primary objective is to uncover the intrinsic concepts underlying each data category by training surrogate explainer networks.
Our approach facilitates the explanation of both accurate and erroneous predictions.
arXiv Detail & Related papers (2023-06-06T09:28:37Z) - Unsupervised Interpretable Basis Extraction for Concept-Based Visual
Explanations [53.973055975918655]
We show that, intermediate layer representations become more interpretable when transformed to the bases extracted with our method.
We compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.
arXiv Detail & Related papers (2023-03-19T00:37:19Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Cause and Effect: Concept-based Explanation of Neural Networks [3.883460584034766]
We take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts.
We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes.
arXiv Detail & Related papers (2021-05-14T18:54:17Z) - Deep Clustering by Semantic Contrastive Learning [67.28140787010447]
We introduce a novel variant called Semantic Contrastive Learning (SCL)
It explores the characteristics of both conventional contrastive learning and deep clustering.
It can amplify the strengths of contrastive learning and deep clustering in a unified approach.
arXiv Detail & Related papers (2021-03-03T20:20:48Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.