The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
- URL: http://arxiv.org/abs/2202.01602v4
- Date: Mon, 8 Jul 2024 12:11:38 GMT
- Title: The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
- Authors: Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari, Himabindu Lakkaraju,
- Abstract summary: We study the disagreement problem in explainable machine learning.
We first conduct interviews with data scientists to understand what constitutes disagreement between explanations.
We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets.
- Score: 36.434727068776965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.
Related papers
- EXAGREE: Towards Explanation Agreement in Explainable Machine Learning [0.0]
Explanations in machine learning are critical for trust, transparency, and fairness.
We introduce a novel framework, EXplanation AGREEment, to bridge diverse interpretations in explainable machine learning.
arXiv Detail & Related papers (2024-11-04T10:28:38Z) - Visualizing and Understanding Contrastive Learning [22.553990823550784]
We design visual explanation methods that contribute towards understanding similarity learning tasks from pairs of images.
We also adapt existing metrics, used to evaluate visual explanations of image classification systems, to suit pairs of explanations.
arXiv Detail & Related papers (2022-06-20T13:01:46Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - On the overlooked issue of defining explanation objectives for
local-surrogate explainers [5.094061357656677]
Local surrogate approaches for explaining machine learning model predictions have appealing properties.
Several methods exist that fit this description and share this goal.
We discuss the implications of the lack of agreement, and clarity, amongst the methods' objectives on the research and practice of explainability.
arXiv Detail & Related papers (2021-06-10T15:24:49Z) - Individual Explanations in Machine Learning Models: A Case Study on
Poverty Estimation [63.18666008322476]
Machine learning methods are being increasingly applied in sensitive societal contexts.
The present case study has two main objectives. First, to expose these challenges and how they affect the use of relevant and novel explanations methods.
And second, to present a set of strategies that mitigate such challenges, as faced when implementing explanation methods in a relevant application domain.
arXiv Detail & Related papers (2021-04-09T01:54:58Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - Evaluating Explanations: How much do explanations from the teacher aid
students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning.
Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z) - Explaining by Removing: A Unified Framework for Model Explanation [14.50261153230204]
Removal-based explanations are based on the principle of simulating feature removal to quantify each feature's influence.
We develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence.
This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature.
arXiv Detail & Related papers (2020-11-21T00:47:48Z) - Towards Interpretable Reasoning over Paragraph Effects in Situation [126.65672196760345]
We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect.
We propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules.
In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model.
arXiv Detail & Related papers (2020-10-03T04:03:52Z) - Explaining Data-Driven Decisions made by AI Systems: The Counterfactual
Approach [11.871523410051527]
We consider an explanation as a set of the system's data inputs that causally drives the decision.
We show that features that have a large importance weight for a model prediction may not affect the corresponding decision.
arXiv Detail & Related papers (2020-01-21T09:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.