Concept backpropagation: An Explainable AI approach for visualising
learned concepts in neural network models
- URL: http://arxiv.org/abs/2307.12601v1
- Date: Mon, 24 Jul 2023 08:21:13 GMT
- Title: Concept backpropagation: An Explainable AI approach for visualising
learned concepts in neural network models
- Authors: Patrik Hammersborg and Inga Str\"umke
- Abstract summary: We present an extension to the method of concept detection, named emphconcept backpropagation, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network models are widely used in a variety of domains, often as
black-box solutions, since they are not directly interpretable for humans. The
field of explainable artificial intelligence aims at developing explanation
methods to address this challenge, and several approaches have been developed
over the recent years, including methods for investigating what type of
knowledge these models internalise during the training process. Among these,
the method of concept detection, investigates which \emph{concepts} neural
network models learn to represent in order to complete their tasks. In this
work, we present an extension to the method of concept detection, named
\emph{concept backpropagation}, which provides a way of analysing how the
information representing a given concept is internalised in a given neural
network model. In this approach, the model input is perturbed in a manner
guided by a trained concept probe for the described model, such that the
concept of interest is maximised. This allows for the visualisation of the
detected concept directly in the input space of the model, which in turn makes
it possible to see what information the model depends on for representing the
described concept. We present results for this method applied to a various set
of input modalities, and discuss how our proposed method can be used to
visualise what information trained concept probes use, and the degree as to
which the representation of the probed concept is entangled within the neural
network model itself.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept.
We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z) - Understanding Multimodal Deep Neural Networks: A Concept Selection View [29.08342307127578]
Concept-based models map the black-box visual representations extracted by deep neural networks onto a set of human-understandable concepts.
We propose a two-stage Concept Selection Model (CSM) to mine core concepts without introducing any human priors.
Our approach achieves comparable performance to end-to-end black-box models.
arXiv Detail & Related papers (2024-04-13T11:06:49Z) - A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans.
We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs)
We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts.
It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - On Modifying a Neural Network's Perception [3.42658286826597]
We propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts.
We test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
arXiv Detail & Related papers (2023-03-05T12:09:37Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.