Invertible Concept-based Explanations for CNN Models with Non-negative
Concept Activation Vectors
- URL: http://arxiv.org/abs/2006.15417v4
- Date: Thu, 17 Jun 2021 12:31:21 GMT
- Title: Invertible Concept-based Explanations for CNN Models with Non-negative
Concept Activation Vectors
- Authors: Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A. Ehinger, Benjamin
I. P. Rubinstein
- Abstract summary: Convolutional neural network (CNN) models for computer vision are powerful but lack explainability in their most basic form.
Recent work on explanations through feature importance of approximate linear models has moved from input-level features to features from mid-layer feature maps in the form of concept activation vectors (CAVs)
In this work, we rethink the ACE algorithm of Ghorbani etal., proposing an alternative invertible concept-based explanation (ICE) framework to overcome its shortcomings.
- Score: 24.581839689833572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural network (CNN) models for computer vision are powerful
but lack explainability in their most basic form. This deficiency remains a key
challenge when applying CNNs in important domains. Recent work on explanations
through feature importance of approximate linear models has moved from
input-level features (pixels or segments) to features from mid-layer feature
maps in the form of concept activation vectors (CAVs). CAVs contain
concept-level information and could be learned via clustering. In this work, we
rethink the ACE algorithm of Ghorbani et~al., proposing an alternative
invertible concept-based explanation (ICE) framework to overcome its
shortcomings. Based on the requirements of fidelity (approximate models to
target models) and interpretability (being meaningful to people), we design
measurements and evaluate a range of matrix factorization methods with our
framework. We find that non-negative concept activation vectors (NCAVs) from
non-negative matrix factorization provide superior performance in
interpretability and fidelity based on computational and human subject
experiments. Our framework provides both local and global concept-level
explanations for pre-trained CNN models.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - CAT: Interpretable Concept-based Taylor Additive Models [17.73885202930879]
Generalized Additive Models (GAMs) can explain deep neural networks (DNNs) at the feature level.
GAMs require large numbers of model parameters and are prone to overfitting, making them hard to train and scale.
We propose CAT, a novel interpretable Concept-bAsed Taylor additive model to simply this process.
arXiv Detail & Related papers (2024-06-25T20:43:15Z) - Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning [86.15009879251386]
We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBM)
CBMs require an additional set of concepts to leverage.
We show a significant increase in accuracy using sparse hidden layers in CLIP-based bottleneck models.
arXiv Detail & Related papers (2024-04-04T09:43:43Z) - Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? [8.391254800873599]
We introduce a method to perform concept-based interventions on pretrained neural networks, which are not interpretable by design.
We formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes.
arXiv Detail & Related papers (2024-01-24T16:02:14Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Local Concept Embeddings for Analysis of Concept Distributions in DNN Feature Spaces [1.0923877073891446]
We propose a novel concept analysis framework for deep neural networks (DNNs)
Instead of optimizing a single global concept vector on the complete dataset, it generates a local concept embedding (LoCE) vector for each individual sample.
Despite its context sensitivity, our method's concept segmentation performance is competitive to global baselines.
arXiv Detail & Related papers (2023-11-24T12:22:00Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
We introduce an interpretable by design model that factors model decisions into intermediate human-legible explanations.
We show that our inherently interpretable system can improve 4.64% over a comparable black-box system in reasoning-focused questions.
arXiv Detail & Related papers (2023-05-24T08:33:15Z) - Concept-based Explanations using Non-negative Concept Activation Vectors
and Decision Tree for CNN Models [4.452019519213712]
This paper evaluates whether training a decision tree based on concepts extracted from a concept-based explainer can increase interpretability for Convolutional Neural Networks (CNNs) models.
arXiv Detail & Related papers (2022-11-19T21:42:55Z) - Navigating Neural Space: Revisiting Concept Activation Vectors to
Overcome Directional Divergence [14.071950294953005]
Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space.
In this paper we show that such a separability-oriented leads to solutions, which may diverge from the actual goal of precisely modeling the concept direction.
We introduce pattern-based CAVs, solely focussing on concept signals, thereby providing more accurate concept directions.
arXiv Detail & Related papers (2022-02-07T19:40:20Z) - Neural Networks with Recurrent Generative Feedback [61.90658210112138]
We instantiate this design on convolutional neural networks (CNNs)
In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.
arXiv Detail & Related papers (2020-07-17T19:32:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.