Cause and Effect: Concept-based Explanation of Neural Networks
- URL: http://arxiv.org/abs/2105.07033v1
- Date: Fri, 14 May 2021 18:54:17 GMT
- Title: Cause and Effect: Concept-based Explanation of Neural Networks
- Authors: Mohammad Nokhbeh Zaeem and Majid Komeili
- Abstract summary: We take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts.
We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes.
- Score: 3.883460584034766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many scenarios, human decisions are explained based on some high-level
concepts. In this work, we take a step in the interpretability of neural
networks by examining their internal representation or neuron's activations
against concepts. A concept is characterized by a set of samples that have
specific features in common. We propose a framework to check the existence of a
causal relationship between a concept (or its negation) and task classes. While
the previous methods focus on the importance of a concept to a task class, we
go further and introduce four measures to quantitatively determine the order of
causality. Through experiments, we demonstrate the effectiveness of the
proposed method in explaining the relationship between a concept and the
predictive behaviour of a neural network.
Related papers
- LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions [15.381209058506078]
Prior works have associated concepts with neurons based on examples of concepts or a pre-defined set of concepts.
We propose to leverage multimodal large language models for automatic and open-ended concept discovery.
We validate each concept by generating examples and counterexamples and evaluating the neuron's response on this new set of images.
arXiv Detail & Related papers (2024-06-12T18:19:37Z) - Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions.
We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks.
We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts.
It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z) - From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks [15.837316393474403]
Concepts can act as a natural link between learning and reasoning.
Knowledge can not only be extracted from neural networks but concept knowledge can also be inserted into neural network architectures.
arXiv Detail & Related papers (2023-10-18T11:08:02Z) - Unit Testing for Concepts in Neural Networks [20.86261546611472]
We propose unit tests for evaluating whether a system's behavior is consistent with Fodor's criteria.
We find that models succeed on tests of groundedness, modularlity, and reusability of concepts, but that important questions about causality remain open.
arXiv Detail & Related papers (2022-07-28T08:49:32Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Towards Interpretable Reasoning over Paragraph Effects in Situation [126.65672196760345]
We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect.
We propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules.
In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model.
arXiv Detail & Related papers (2020-10-03T04:03:52Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.