Related papers: An Axiomatic Approach to Model-Agnostic Concept Explanations

An Axiomatic Approach to Model-Agnostic Concept Explanations

URL: http://arxiv.org/abs/2401.06890v1
Date: Fri, 12 Jan 2024 20:53:35 GMT
Title: An Axiomatic Approach to Model-Agnostic Concept Explanations
Authors: Zhili Feng, Michal Moshkovitz, Dotan Di Castro, J. Zico Kolter
Abstract summary: We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
Score: 67.84000759813435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Concept explanation is a popular approach for examining how human-interpretable concepts impact the predictions of a model. However, most existing methods for concept explanations are tailored to specific models. To address this issue, this paper focuses on model-agnostic measures. Specifically, we propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings. Experimentally, we demonstrate the utility of the new method by applying it in different scenarios: for model selection, optimizer selection, and model improvement using a kind of prompt editing for zero-shot vision language models.

Related papers

Fast Explainability via Feasible Concept Sets Generator [7.011763596804071]
We bridge the gap between the universality of model-agnostic approaches and the efficiency of model-specific approaches. We first define explanations through a set of human-comprehensible concepts. Second, we show that a minimal feasible set generator can be learned as a companion explainer to the prediction model.
arXiv Detail & Related papers (2024-05-29T00:01:40Z)
Evaluating Readability and Faithfulness of Concept-based Explanations [35.48852504832633]
Concept-based explanations arise as a promising avenue for explaining high-level patterns learned by Large Language Models. Current methods approach concepts from different perspectives, lacking a unified formalization. This makes evaluating the core measures of concepts, namely faithfulness or readability, challenging.
arXiv Detail & Related papers (2024-04-29T09:20:25Z)
A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans. We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs) We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z)
On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z)
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model. We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
ConceptDistil: Model-Agnostic Distillation of Concept Explanations [4.462334751640166]
Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop. We propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation. We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks.
arXiv Detail & Related papers (2022-05-07T08:58:54Z)
Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV) We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Debiasing Concept-based Explanations with Causal Analysis [4.911435444514558]
We study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables. We show that our debiasing method works when the concepts are not complete.
arXiv Detail & Related papers (2020-07-22T15:42:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.