Related papers: Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks

Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks

URL: http://arxiv.org/abs/2012.10076v1
Date: Fri, 18 Dec 2020 07:04:04 GMT
Title: Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks
Authors: Kieran Browne, Ben Swift
Abstract summary: Recent papers in explainable AI have made a compelling case for counterfactual modes of explanation. While counterfactual explanations appear to be extremely effective in some instances, they are formally equivalent to adversarial examples. This presents an apparent paradox for explainability researchers: if these two procedures are formally equivalent, what accounts for the explanatory divide apparent between counterfactual explanations and adversarial examples? We resolve this paradox by placing emphasis back on the semantics of counterfactual expressions.
Score: 15.102346715690759
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent papers in explainable AI have made a compelling case for counterfactual modes of explanation. While counterfactual explanations appear to be extremely effective in some instances, they are formally equivalent to adversarial examples. This presents an apparent paradox for explainability researchers: if these two procedures are formally equivalent, what accounts for the explanatory divide apparent between counterfactual explanations and adversarial examples? We resolve this paradox by placing emphasis back on the semantics of counterfactual expressions. Producing satisfactory explanations for deep learning systems will require that we find ways to interpret the semantics of hidden layer representations in deep neural networks.

Related papers

Dialogue-based Explanations for Logical Reasoning using Structured Argumentation [0.06138671548064355]
We identify structural weaknesses of the state-of-the-art and propose a generic argumentation-based approach to address these problems. Our work provides dialogue models as dialectic-proof procedures to compute and explain a query answer. This allows us to construct dialectical proof trees as explanations, which are more expressive and arguably more intuitive than existing explanation formalisms.
arXiv Detail & Related papers (2025-02-16T22:26:18Z)
Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance [4.962171160815189]
We introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations. We first explore the advantage of dissenting explanations in the setting of model multiplicity. We demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy.
arXiv Detail & Related papers (2023-07-14T21:27:00Z)
Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations [118.0818807474809]
Abductive reasoning aims to find plausible explanations for an event. Existing approaches for abductive reasoning in natural language processing often rely on manually generated annotations for supervision. This work proposes an approach for abductive commonsense reasoning that exploits the fact that only a subset of explanations is correct for a given context.
arXiv Detail & Related papers (2023-05-24T01:35:10Z)
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z)
Explanatory Paradigms in Neural Networks [18.32369721322249]
We present a leap-forward expansion to the study of explainability in neural networks by considering explanations as answers to reasoning-based questions. The answers to these questions are observed correlations, observed counterfactuals, and observed contrastive explanations respectively. The term observed refers to the specific case of post-hoc explainability, when an explanatory technique explains the decision $P$ after a trained neural network has made the decision $P$.
arXiv Detail & Related papers (2022-02-24T00:22:11Z)
Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data. We find that people often mis-interpret the explanations. We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z)
Towards Relatable Explainable AI with the Perceptual Process [5.581885362337179]
We argue that explanations must be more relatable to other concepts, hypotheticals, and associations. Inspired by cognitive psychology, we propose the XAI Perceptual Processing Framework and RexNet model for relatable explainable AI.
arXiv Detail & Related papers (2021-12-28T05:48:53Z)
Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks. We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Bayesian Interpolants as Explanations for Neural Inferences [0.0]
The notion of Craig interpolant is adapted from logical inference to statistical inference and used to explain inferences made by neural networks. The method produces explanations that are concise, understandable and precise.
arXiv Detail & Related papers (2020-04-08T18:45:06Z)
Adequate and fair explanations [12.33259114006129]
We focus upon the second school of exact explanations with a rigorous logical foundation. With counterfactual explanations, many of the assumptions needed to provide a complete explanation are left implicit. We explore how to move from local partial explanations to what we call complete local explanations and then to global ones.
arXiv Detail & Related papers (2020-01-21T14:42:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.