Related papers: Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision

Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision

URL: http://arxiv.org/abs/2312.17285v2
Date: Tue, 5 Mar 2024 23:40:45 GMT
Title: Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision
Authors: Wonjoon Chang, Dahee Kwon, Jaesik Choi
Abstract summary: We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons. Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts. It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
Score: 25.449397570387802
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Understanding intermediate representations of the concepts learned by deep learning classifiers is indispensable for interpreting general model behaviors. Existing approaches to reveal learned concepts often rely on human supervision, such as pre-defined concept sets or segmentation processes. In this paper, we propose a novel unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons. Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts. Based on the observations, the proposed method selects principal neurons that construct an interpretable region, namely a Relaxed Decision Region (RDR), encompassing instances with coherent concepts in the feature space. It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications. Furthermore, the applicability of our method across various layers discloses distinct distributed representations over the layers, which provides deeper insights into the internal mechanisms of the deep learning model.

Related papers

Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning [2.9539724161670167]
Deep reinforcement learning (DRL) has successfully addressed many complex control problems. Current DRL interpretability methods largely treat neural networks as black boxes. We propose a novel concept-based interpretability method that provides fine-grained explanations of DRL models at the neuron level.
arXiv Detail & Related papers (2025-02-02T06:05:49Z)
CoLiDR: Concept Learning using Aggregated Disentangled Representations [29.932706137805713]
Interpretability of Deep Neural Networks using concept-based models offers a promising way to explain model behavior through human-understandable concepts. A parallel line of research focuses on disentangling the data distribution into its underlying generative factors, in turn explaining the data generation process. While both directions have received extensive attention, little work has been done on explaining concepts in terms of generative factors to unify mathematically disentangled representations and human-understandable concepts.
arXiv Detail & Related papers (2024-07-27T16:55:14Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models [0.0]
We present an extension to the method of concept detection, named emphconcept backpropagation, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model.
arXiv Detail & Related papers (2023-07-24T08:21:13Z)
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z)
Uncovering Unique Concept Vectors through Latent Space Decomposition [0.0]
Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates. We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
arXiv Detail & Related papers (2023-07-13T17:21:54Z)
A Unified Concept-Based System for Local, Global, and Misclassification Explanations [13.321794212377949]
We present a unified concept-based system for unsupervised learning of both local and global concepts. Our primary objective is to uncover the intrinsic concepts underlying each data category by training surrogate explainer networks. Our approach facilitates the explanation of both accurate and erroneous predictions.
arXiv Detail & Related papers (2023-06-06T09:28:37Z)
Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations [53.973055975918655]
We show that, intermediate layer representations become more interpretable when transformed to the bases extracted with our method. We compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.
arXiv Detail & Related papers (2023-03-19T00:37:19Z)
Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations. We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z)
Cause and Effect: Concept-based Explanation of Neural Networks [3.883460584034766]
We take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts. We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes.
arXiv Detail & Related papers (2021-05-14T18:54:17Z)
Deep Clustering by Semantic Contrastive Learning [67.28140787010447]
We introduce a novel variant called Semantic Contrastive Learning (SCL) It explores the characteristics of both conventional contrastive learning and deep clustering. It can amplify the strengths of contrastive learning and deep clustering in a unified approach.
arXiv Detail & Related papers (2021-03-03T20:20:48Z)
Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.