Related papers: Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability

Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability

URL: http://arxiv.org/abs/2207.09615v2
Date: Fri, 12 May 2023 15:48:51 GMT
Title: Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability
Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong and Olga Russakovsky
Abstract summary: Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts. Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literature. We analyze three commonly overlooked factors in concept-based explanations.
Score: 25.545486537295144
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts. These methods evaluate a trained model on a new, "probe" dataset and correlate model predictions with the visual concepts labeled in that dataset. Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literature. In this work, we analyze three commonly overlooked factors in concept-based explanations. First, the choice of the probe dataset has a profound impact on the generated explanations. Our analysis reveals that different probe datasets may lead to very different explanations, and suggests that the explanations are not generalizable outside the probe dataset. Second, we find that concepts in the probe dataset are often less salient and harder to learn than the classes they claim to explain, calling into question the correctness of the explanations. We argue that only visually salient concepts should be used in concept-based explanations. Finally, while existing methods use hundreds or even thousands of concepts, our human studies reveal a much stricter upper bound of 32 concepts or less, beyond which the explanations are much less practically useful. We make suggestions for future development and analysis of concept-based interpretability methods. Code for our analysis and user interface can be found at \url{https://github.com/princetonvisualai/OverlookedFactors}

Related papers

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [79.01538178959726]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence. We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Diverse Concept Proposals for Concept Bottleneck Models [23.395270888378594]
Concept bottleneck models are interpretable predictive models that are often used in domains where model trust is a key priority, such as healthcare. Our proposed approach identifies a number of predictive concepts that explain the data. By offering multiple alternative explanations, we allow the human expert to choose the one that best aligns with their expectation.
arXiv Detail & Related papers (2024-12-24T00:12:34Z)
CoLiDR: Concept Learning using Aggregated Disentangled Representations [29.932706137805713]
Interpretability of Deep Neural Networks using concept-based models offers a promising way to explain model behavior through human-understandable concepts. A parallel line of research focuses on disentangling the data distribution into its underlying generative factors, in turn explaining the data generation process. While both directions have received extensive attention, little work has been done on explaining concepts in terms of generative factors to unify mathematically disentangled representations and human-understandable concepts.
arXiv Detail & Related papers (2024-07-27T16:55:14Z)
Explaining Explainability: Understanding Concept Activation Vectors [35.37586279472797]
Recent interpretability methods propose using concept-based explanations to translate internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. In this work, we investigate three properties of Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact.
arXiv Detail & Related papers (2024-04-04T17:46:20Z)
An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z)
Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Although popular for their easy interpretation, concept explanations are known to be noisy. We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z)
A Geometric Notion of Causal Probing [85.49839090913515]
The linear subspace hypothesis states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace. We give a set of intrinsic criteria which characterize an ideal linear concept subspace. We find that, for at least one concept across two languages models, the concept subspace can be used to manipulate the concept value of the generated word with precision.
arXiv Detail & Related papers (2023-07-27T17:57:57Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation [16.783836191022445]
The field of eXplainable Artificial Intelligence (XAI) aims to bring transparency to today's powerful but opaque deep learning models. While local XAI methods explain individual predictions in form of attribution maps, global explanation techniques visualize what concepts a model has generally learned to encode.
arXiv Detail & Related papers (2022-06-07T12:05:58Z)
Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV) We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.