An Axiomatic Approach to Model-Agnostic Concept Explanations
- URL: http://arxiv.org/abs/2401.06890v1
- Date: Fri, 12 Jan 2024 20:53:35 GMT
- Title: An Axiomatic Approach to Model-Agnostic Concept Explanations
- Authors: Zhili Feng, Michal Moshkovitz, Dotan Di Castro, J. Zico Kolter
- Abstract summary: We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
- Score: 67.84000759813435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Concept explanation is a popular approach for examining how
human-interpretable concepts impact the predictions of a model. However, most
existing methods for concept explanations are tailored to specific models. To
address this issue, this paper focuses on model-agnostic measures.
Specifically, we propose an approach to concept explanations that satisfy three
natural axioms: linearity, recursivity, and similarity. We then establish
connections with previous concept explanation methods, offering insight into
their varying semantic meanings. Experimentally, we demonstrate the utility of
the new method by applying it in different scenarios: for model selection,
optimizer selection, and model improvement using a kind of prompt editing for
zero-shot vision language models.
Related papers
- Fast Explainability via Feasible Concept Sets Generator [7.011763596804071]
We bridge the gap between the universality of model-agnostic approaches and the efficiency of model-specific approaches.
We first define explanations through a set of human-comprehensible concepts.
Second, we show that a minimal feasible set generator can be learned as a companion explainer to the prediction model.
arXiv Detail & Related papers (2024-05-29T00:01:40Z) - Evaluating Readability and Faithfulness of Concept-based Explanations [35.48852504832633]
Concept-based explanations arise as a promising avenue for explaining high-level patterns learned by Large Language Models.
Current methods approach concepts from different perspectives, lacking a unified formalization.
This makes evaluating the core measures of concepts, namely faithfulness or readability, challenging.
arXiv Detail & Related papers (2024-04-29T09:20:25Z) - A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans.
We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs)
We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z) - On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z) - Evaluating the Robustness of Interpretability Methods through
Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model.
We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - ConceptDistil: Model-Agnostic Distillation of Concept Explanations [4.462334751640166]
Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop.
We propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation.
We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks.
arXiv Detail & Related papers (2022-05-07T08:58:54Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.