Related papers: Cause and Effect: Concept-based Explanation of Neural Networks

Cause and Effect: Concept-based Explanation of Neural Networks

URL: http://arxiv.org/abs/2105.07033v1
Date: Fri, 14 May 2021 18:54:17 GMT
Title: Cause and Effect: Concept-based Explanation of Neural Networks
Authors: Mohammad Nokhbeh Zaeem and Majid Komeili
Abstract summary: We take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts. We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes.
Score: 3.883460584034766
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many scenarios, human decisions are explained based on some high-level concepts. In this work, we take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts. A concept is characterized by a set of samples that have specific features in common. We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes. While the previous methods focus on the importance of a concept to a task class, we go further and introduce four measures to quantitatively determine the order of causality. Through experiments, we demonstrate the effectiveness of the proposed method in explaining the relationship between a concept and the predictive behaviour of a neural network.

Related papers

Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Compositional Concept-Based Neuron-Level Interpretability for Deep Reinforcement Learning [2.9539724161670167]
Deep reinforcement learning (DRL) has successfully addressed many complex control problems. Current DRL interpretability methods largely treat neural networks as black boxes. We propose a novel concept-based interpretability method that provides fine-grained explanations of DRL models at the neuron level.
arXiv Detail & Related papers (2025-02-02T06:05:49Z)
LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions [15.381209058506078]
Prior works have associated concepts with neurons based on examples of concepts or a pre-defined set of concepts. We propose to leverage multimodal large language models for automatic and open-ended concept discovery. We validate each concept by generating examples and counterexamples and evaluating the neuron's response on this new set of images.
arXiv Detail & Related papers (2024-06-12T18:19:37Z)
Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions. We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks. We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Understanding Distributed Representations of Concepts in Deep Neural Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons. Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts. It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z)
From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks [15.837316393474403]
Concepts can act as a natural link between learning and reasoning. Knowledge can not only be extracted from neural networks but concept knowledge can also be inserted into neural network architectures.
arXiv Detail & Related papers (2023-10-18T11:08:02Z)
Unit Testing for Concepts in Neural Networks [20.86261546611472]
We propose unit tests for evaluating whether a system's behavior is consistent with Fodor's criteria. We find that models succeed on tests of groundedness, modularlity, and reusability of concepts, but that important questions about causality remain open.
arXiv Detail & Related papers (2022-07-28T08:49:32Z)
Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV) We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)
Towards Interpretable Reasoning over Paragraph Effects in Situation [126.65672196760345]
We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect. We propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules. In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model.
arXiv Detail & Related papers (2020-10-03T04:03:52Z)
Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.