Related papers: CoSy: Evaluating Textual Explanations of Neurons

Related papers

Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
CoLiDR: Concept Learning using Aggregated Disentangled Representations [29.932706137805713]
Interpretability of Deep Neural Networks using concept-based models offers a promising way to explain model behavior through human-understandable concepts. A parallel line of research focuses on disentangling the data distribution into its underlying generative factors, in turn explaining the data generation process. While both directions have received extensive attention, little work has been done on explaining concepts in terms of generative factors to unify mathematically disentangled representations and human-understandable concepts.
arXiv Detail & Related papers (2024-07-27T16:55:14Z)
Less is More: Discovering Concise Network Explanations [26.126343100127936]
We introduce Discovering Conceptual Network Explanations (DCNE), a new approach for generating human-comprehensible visual explanations. Our method automatically finds visual explanations that are critical for discriminating between classes. DCNE represents a step forward in making neural network decisions accessible and interpretable to humans.
arXiv Detail & Related papers (2024-05-24T06:10:23Z)
Towards Generating Informative Textual Description for Neurons in Language Models [6.884227665279812]
We propose a framework that ties textual descriptions to neurons. In particular, our experiment shows that the proposed approach achieves 75% precision@2, and 50% recall@2
arXiv Detail & Related papers (2024-01-30T04:06:25Z)
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z)
NeuroExplainer: Fine-Grained Attention Decoding to Uncover Cortical Development Patterns of Preterm Infants [73.85768093666582]
We propose an explainable geometric deep network dubbed NeuroExplainer. NeuroExplainer is used to uncover altered infant cortical development patterns associated with preterm birth.
arXiv Detail & Related papers (2023-01-01T12:48:12Z)
Mapping Knowledge Representations to Concepts: A Review and New Perspectives [0.6875312133832078]
This review focuses on research that aims to associate internal representations with human understandable concepts. We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations. The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability.
arXiv Detail & Related papers (2022-12-31T12:56:12Z)
Formal Conceptual Views in Neural Networks [0.0]
We introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. We demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons.
arXiv Detail & Related papers (2022-09-27T16:38:24Z)
Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis [0.0]
Graph neural networks (GNNs) are highly effective on a variety of graph-related tasks. They lack interpretability and transparency. Current explainability approaches are typically local and treat GNNs as black-boxes. We propose a novel approach for producing global explanations for GNNs using neuron-level concepts.
arXiv Detail & Related papers (2022-08-22T21:30:55Z)
Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV) We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)
Natural Language Descriptions of Deep Visual Features [50.270035018478666]
We introduce a procedure that automatically labels neurons with open-ended, compositional, natural language descriptions. We use MILAN for analysis, characterizing the distribution and importance of neurons selective for attribute, category, and relational information in vision models. We also use MILAN for auditing, surfacing neurons sensitive to protected categories like race and gender in models trained on datasets intended to obscure these features.
arXiv Detail & Related papers (2022-01-26T18:48:02Z)
Generalizable Neuro-symbolic Systems for Commonsense Question Answering [67.72218865519493]
This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks. Different methods for integrating neural language models and knowledge graphs are discussed.
arXiv Detail & Related papers (2022-01-17T06:13:37Z)
Cause and Effect: Concept-based Explanation of Neural Networks [3.883460584034766]
We take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts. We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes.
arXiv Detail & Related papers (2021-05-14T18:54:17Z)
Interpretable Deep Learning: Interpretations, Interpretability, Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused. We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy. We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z)
Visual Question Answering based on Local-Scene-Aware Referring Expression Generation [27.080830480999527]
We propose the use of text expressions generated for images to represent complex scenes and explain decisions. The generated expressions can be incorporated with visual features and question embedding to obtain the question-relevant answer. A joint-embedding multi-head attention network is also proposed to model three different information modalities with co-attention.
arXiv Detail & Related papers (2021-01-22T07:28:28Z)
Neuron-based explanations of neural networks sacrifice completeness and interpretability [67.53271920386851]
We show that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability. We show the most important principal components provide more complete and interpretable explanations than the most important neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings.
arXiv Detail & Related papers (2020-11-05T21:26:03Z)
A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques. We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.