CoSy: Evaluating Textual Explanations of Neurons
- URL: http://arxiv.org/abs/2405.20331v1
- Date: Thu, 30 May 2024 17:59:04 GMT
- Title: CoSy: Evaluating Textual Explanations of Neurons
- Authors: Laura Kopf, Philine Lou Bommer, Anna Hedström, Sebastian Lapuschkin, Marina M. -C. Höhne, Kirill Bykov,
- Abstract summary: A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within latent representations.
We introduce CoSy -- a novel framework to evaluate the quality of textual explanations for latent neurons.
- Score: 5.696573924249008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While various methods exist to connect neurons to textual descriptions of human-understandable concepts, evaluating the quality of these explanation methods presents a major challenge in the field due to a lack of unified, general-purpose quantitative evaluation. In this work, we introduce CoSy (Concept Synthesis) -- a novel, architecture-agnostic framework to evaluate the quality of textual explanations for latent neurons. Given textual explanations, our proposed framework leverages a generative model conditioned on textual input to create data points representing the textual explanation. Then, the neuron's response to these explanation data points is compared with the response to control data points, providing a quality estimate of the given explanation. We ensure the reliability of our proposed framework in a series of meta-evaluation experiments and demonstrate practical value through insights from benchmarking various concept-based textual explanation methods for Computer Vision tasks, showing that tested explanation methods significantly differ in quality.
Related papers
- CoLiDR: Concept Learning using Aggregated Disentangled Representations [29.932706137805713]
Interpretability of Deep Neural Networks using concept-based models offers a promising way to explain model behavior through human-understandable concepts.
A parallel line of research focuses on disentangling the data distribution into its underlying generative factors, in turn explaining the data generation process.
While both directions have received extensive attention, little work has been done on explaining concepts in terms of generative factors to unify mathematically disentangled representations and human-understandable concepts.
arXiv Detail & Related papers (2024-07-27T16:55:14Z) - Less is More: Discovering Concise Network Explanations [26.126343100127936]
We introduce Discovering Conceptual Network Explanations (DCNE), a new approach for generating human-comprehensible visual explanations.
Our method automatically finds visual explanations that are critical for discriminating between classes.
DCNE represents a step forward in making neural network decisions accessible and interpretable to humans.
arXiv Detail & Related papers (2024-05-24T06:10:23Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Mapping Knowledge Representations to Concepts: A Review and New
Perspectives [0.6875312133832078]
This review focuses on research that aims to associate internal representations with human understandable concepts.
We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations.
The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability.
arXiv Detail & Related papers (2022-12-31T12:56:12Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Generalizable Neuro-symbolic Systems for Commonsense Question Answering [67.72218865519493]
This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks.
Different methods for integrating neural language models and knowledge graphs are discussed.
arXiv Detail & Related papers (2022-01-17T06:13:37Z) - Cause and Effect: Concept-based Explanation of Neural Networks [3.883460584034766]
We take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts.
We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes.
arXiv Detail & Related papers (2021-05-14T18:54:17Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Visual Question Answering based on Local-Scene-Aware Referring
Expression Generation [27.080830480999527]
We propose the use of text expressions generated for images to represent complex scenes and explain decisions.
The generated expressions can be incorporated with visual features and question embedding to obtain the question-relevant answer.
A joint-embedding multi-head attention network is also proposed to model three different information modalities with co-attention.
arXiv Detail & Related papers (2021-01-22T07:28:28Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.