A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments
- URL: http://arxiv.org/abs/2401.12631v1
- Date: Tue, 23 Jan 2024 10:27:42 GMT
- Title: A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments
- Authors: Zhengxuan Wu and Atticus Geiger and Jing Huang and Aryaman Arora and
Thomas Icard and Christopher Potts and Noah D. Goodman
- Abstract summary: We argue that Makelov et al. (2023) see in practice are artifacts of their training and evaluation paradigms.
Though we disagree with their core characterization, Makelov et al. (2023)'s examples and discussion have undoubtedly pushed the field of interpretability forward.
- Score: 59.87080148922358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We respond to the recent paper by Makelov et al. (2023), which reviews
subspace interchange intervention methods like distributed alignment search
(DAS; Geiger et al. 2023) and claims that these methods potentially cause
"interpretability illusions". We first review Makelov et al. (2023)'s technical
notion of what an "interpretability illusion" is, and then we show that even
intuitive and desirable explanations can qualify as illusions in this sense. As
a result, their method of discovering "illusions" can reject explanations they
consider "non-illusory". We then argue that the illusions Makelov et al. (2023)
see in practice are artifacts of their training and evaluation paradigms. We
close by emphasizing that, though we disagree with their core characterization,
Makelov et al. (2023)'s examples and discussion have undoubtedly pushed the
field of interpretability forward.
Related papers
- Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance [4.962171160815189]
We introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations.
We first explore the advantage of dissenting explanations in the setting of model multiplicity.
We demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy.
arXiv Detail & Related papers (2023-07-14T21:27:00Z) - Counterfactual Explanations for Misclassified Images: How Human and
Machine Explanations Differ [11.508304497344637]
Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating predictions of black-box deep-learning systems.
While over 100 counterfactual methods exist, claiming to generate plausible explanations akin to those preferred by people, few have actually been tested on users.
This issue is addressed here using a novel methodology that gathers ground truth human-generated counterfactual explanations for misclassified images.
arXiv Detail & Related papers (2022-12-16T22:05:38Z) - Reply to "Comment on 'Why interference phenomena do not capture the
essence of quantum theory' " [0.0]
We argue that the phenomenology of interference that is traditionally regarded as problematic does not, in fact, capture the essence of quantum theory.
It does so by demonstrating the existence of a physical theory, which we term the "toy field theory", that reproduces this phenomenology but which does not sacrifice the classical worldview.
arXiv Detail & Related papers (2022-07-24T18:59:35Z) - Does Science need Intersubjectivity? The Problem of Confirmation in
Orthodox Interpretations of Quantum Mechanics [0.0]
We argue that any successful interpretation of quantum mechanics must explain how our empirical evidence allows us to come to know about quantum mechanics.
We take a detailed look at the way in which belief-updating might work in the kind of universe postulated by an orthodox interpretation.
We argue that in some versions of these interpretations it is not even possible to use one's own relative frequencies for empirical confirmation.
arXiv Detail & Related papers (2022-03-30T13:14:34Z) - The Abduction of Sherlock Holmes: A Dataset for Visual Abductive
Reasoning [113.25016899663191]
Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image.
We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents.
arXiv Detail & Related papers (2022-02-10T02:26:45Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - The Who in XAI: How AI Background Shapes Perceptions of AI Explanations [61.49776160925216]
We conduct a mixed-methods study of how two different groups--people with and without AI background--perceive different types of AI explanations.
We find that (1) both groups showed unwarranted faith in numbers for different reasons and (2) each group found value in different explanations beyond their intended design.
arXiv Detail & Related papers (2021-07-28T17:32:04Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for
Post-Hoc Interpretability [54.85658598523915]
We propose to have a concrete definition of interpretation before we could evaluate faithfulness of an interpretation.
We find that although interpretation methods perform differently under a certain evaluation metric, such a difference may not result from interpretation quality or faithfulness.
arXiv Detail & Related papers (2020-09-16T06:38:03Z) - Quantum Theory Needs No 'Interpretation' But 'Theoretical
Formal-Conceptual Unity' (Or: Escaping Adan Cabello's "Map of Madness" With
the Help of David Deutsch's Explanations) [0.0]
We argue that there are reasons to believe that the creation of 'interpretations' for the theory of quanta has functioned as a trap designed by anti-realists.
We will argue that the key to escape the anti-realist trap of interpretation is to recognize that --as Einstein told Heisenberg almost one century ago-- it is only the theory which can tell you what can be observed.
arXiv Detail & Related papers (2020-08-01T19:10:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.