Dissenting Explanations: Leveraging Disagreement to Reduce Model
Overreliance
- URL: http://arxiv.org/abs/2307.07636v2
- Date: Thu, 22 Feb 2024 17:47:23 GMT
- Title: Dissenting Explanations: Leveraging Disagreement to Reduce Model
Overreliance
- Authors: Omer Reingold, Judy Hanwen Shen, Aditi Talati
- Abstract summary: We introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations.
We first explore the advantage of dissenting explanations in the setting of model multiplicity.
We demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy.
- Score: 5.5769831014164675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While explainability is a desirable characteristic of increasingly complex
black-box models, modern explanation methods have been shown to be inconsistent
and contradictory. The semantics of explanations is not always fully understood
- to what extent do explanations "explain" a decision and to what extent do
they merely advocate for a decision? Can we help humans gain insights from
explanations accompanying correct predictions and not over-rely on incorrect
predictions advocated for by explanations? With this perspective in mind, we
introduce the notion of dissenting explanations: conflicting predictions with
accompanying explanations. We first explore the advantage of dissenting
explanations in the setting of model multiplicity, where multiple models with
similar performance may have different predictions. In such cases, providing
dissenting explanations could be done by invoking the explanations of
disagreeing models. Through a pilot study, we demonstrate that dissenting
explanations reduce overreliance on model predictions, without reducing overall
accuracy. Motivated by the utility of dissenting explanations we present both
global and local methods for their generation.
Related papers
- DiConStruct: Causal Concept-based Explanations through Black-Box
Distillation [9.735426765564474]
We present DiConStruct, an explanation method that is both concept-based and causal.
Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations.
arXiv Detail & Related papers (2024-01-16T17:54:02Z) - Counterfactual Explanations for Predictive Business Process Monitoring [0.90238471756546]
We propose LORELEY, a counterfactual explanation technique for predictive process monitoring.
LORELEY can approximate prediction models with an average fidelity of 97.69% and generate realistic counterfactual explanations.
arXiv Detail & Related papers (2022-02-24T11:01:20Z) - Causal Explanations and XAI [8.909115457491522]
An important goal of Explainable Artificial Intelligence (XAI) is to compensate for mismatches by offering explanations.
I take a step further by formally defining the causal notions of sufficient explanations and counterfactual explanations.
I also touch upon the significance of this work for fairness in AI by showing how actual causation can be used to improve the idea of path-specific counterfactual fairness.
arXiv Detail & Related papers (2022-01-31T12:32:10Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - Evaluating Explanations: How much do explanations from the teacher aid
students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning.
Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z) - Towards Interpretable Natural Language Understanding with Explanations
as Latent Variables [146.83882632854485]
We develop a framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.
Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model.
arXiv Detail & Related papers (2020-10-24T02:05:56Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.