Cause and Effect: Concept-based Explanation of Neural Networks
- URL: http://arxiv.org/abs/2105.07033v1
- Date: Fri, 14 May 2021 18:54:17 GMT
- Title: Cause and Effect: Concept-based Explanation of Neural Networks
- Authors: Mohammad Nokhbeh Zaeem and Majid Komeili
- Abstract summary: We take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts.
We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes.
- Score: 3.883460584034766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many scenarios, human decisions are explained based on some high-level
concepts. In this work, we take a step in the interpretability of neural
networks by examining their internal representation or neuron's activations
against concepts. A concept is characterized by a set of samples that have
specific features in common. We propose a framework to check the existence of a
causal relationship between a concept (or its negation) and task classes. While
the previous methods focus on the importance of a concept to a task class, we
go further and introduce four measures to quantitatively determine the order of
causality. Through experiments, we demonstrate the effectiveness of the
proposed method in explaining the relationship between a concept and the
predictive behaviour of a neural network.
Related papers
- Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.
We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.
We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z) - Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning [2.9539724161670167]
Deep reinforcement learning (DRL) has successfully addressed many complex control problems.
Current DRL interpretability methods largely treat neural networks as black boxes.
We propose a novel concept-based interpretability method that provides fine-grained explanations of DRL models at the neuron level.
arXiv Detail & Related papers (2025-02-02T06:05:49Z) - LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions [15.381209058506078]
Prior works have associated concepts with neurons based on examples of concepts or a pre-defined set of concepts.
We propose to leverage multimodal large language models for automatic and open-ended concept discovery.
We validate each concept by generating examples and counterexamples and evaluating the neuron's response on this new set of images.
arXiv Detail & Related papers (2024-06-12T18:19:37Z) - Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions.
We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks.
We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts.
It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z) - Unit Testing for Concepts in Neural Networks [20.86261546611472]
We propose unit tests for evaluating whether a system's behavior is consistent with Fodor's criteria.
We find that models succeed on tests of groundedness, modularlity, and reusability of concepts, but that important questions about causality remain open.
arXiv Detail & Related papers (2022-07-28T08:49:32Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.