Interpreting Interpretations: Organizing Attribution Methods by Criteria
- URL: http://arxiv.org/abs/2002.07985v2
- Date: Sat, 4 Apr 2020 17:29:09 GMT
- Title: Interpreting Interpretations: Organizing Attribution Methods by Criteria
- Authors: Zifan Wang and Piotr Mardziel and Anupam Datta and Matt Fredrikson
- Abstract summary: In this work we expand the foundationsof human-understandable concepts with which attributionscan is interpreted.
We incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality.
We evaluate our measures on a collection of methods explaining convolutional neural networks (CNN) for image classification.
- Score: 40.812424038838984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by distinct, though related, criteria, a growing number of
attribution methods have been developed tointerprete deep learning. While each
relies on the interpretability of the concept of "importance" and our ability
to visualize patterns, explanations produced by the methods often differ. As a
result, input attribution for vision models fail to provide any level of human
understanding of model behaviour. In this work we expand the foundationsof
human-understandable concepts with which attributionscan be interpreted beyond
"importance" and its visualization; we incorporate the logical concepts of
necessity andsufficiency, and the concept of proportionality. We definemetrics
to represent these concepts as quantitative aspectsof an attribution. This
allows us to compare attributionsproduced by different methods and interpret
them in novelways: to what extent does this attribution (or this
method)represent the necessity or sufficiency of the highlighted inputs, and to
what extent is it proportional? We evaluate our measures on a collection of
methods explaining convolutional neural networks (CNN) for image
classification. We conclude that some attribution methods are more appropriate
for interpretation in terms of necessity while others are in terms of
sufficiency, while no method is always the most appropriate in terms of both.
Related papers
- Identifying and interpreting non-aligned human conceptual
representations using language modeling [0.0]
We show that congenital blindness induces conceptual reorganization in both a-modal and sensory-related verbal domains.
We find that blind individuals more strongly associate social and cognitive meanings to verbs related to motion.
For some verbs, representations of blind and sighted are highly similar.
arXiv Detail & Related papers (2024-03-10T13:02:27Z) - Interpretability is in the Mind of the Beholder: A Causal Framework for
Human-interpretable Representation Learning [22.201878275784246]
Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data.
How to reliably acquire such concepts is, however, still fundamentally unclear.
We propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks.
arXiv Detail & Related papers (2023-09-14T14:26:20Z) - Hierarchical Semantic Tree Concept Whitening for Interpretable Image
Classification [19.306487616731765]
Post-hoc analysis can only discover the patterns or rules that naturally exist in models.
We proactively instill knowledge to alter the representation of human-understandable concepts in hidden layers.
Our method improves model interpretability, showing better disentanglement of semantic concepts, without negatively affecting model classification performance.
arXiv Detail & Related papers (2023-07-10T04:54:05Z) - Evaluating the Robustness of Interpretability Methods through
Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model.
We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z) - Translational Concept Embedding for Generalized Compositional Zero-shot
Learning [73.60639796305415]
Generalized compositional zero-shot learning means to learn composed concepts of attribute-object pairs in a zero-shot fashion.
This paper introduces a new approach, termed translational concept embedding, to solve these two difficulties in a unified framework.
arXiv Detail & Related papers (2021-12-20T21:27:51Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Separating Skills and Concepts for Novel Visual Question Answering [66.46070380927372]
Generalization to out-of-distribution data has been a problem for Visual Question Answering (VQA) models.
"Skills" are visual tasks, such as counting or attribute recognition, and are applied to "concepts" mentioned in the question.
We present a novel method for learning to compose skills and concepts that separates these two factors implicitly within a model.
arXiv Detail & Related papers (2021-07-19T18:55:10Z) - Is Disentanglement all you need? Comparing Concept-based &
Disentanglement Approaches [24.786152654589067]
We give an overview of concept-based explanations and disentanglement approaches.
We show that state-of-the-art approaches from both classes can be data inefficient, sensitive to the specific nature of the classification/regression task, or sensitive to the employed concept representation.
arXiv Detail & Related papers (2021-04-14T15:06:34Z) - Robust Semantic Interpretability: Revisiting Concept Activation Vectors [0.0]
Interpretability methods for image classification attempt to expose whether the model is systematically biased or attending to the same cues as a human would.
Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole.
arXiv Detail & Related papers (2021-04-06T20:14:59Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.