Meaningfully Explaining a Model's Mistakes
- URL: http://arxiv.org/abs/2106.12723v1
- Date: Thu, 24 Jun 2021 01:49:55 GMT
- Title: Meaningfully Explaining a Model's Mistakes
- Authors: Abubakar Abid, James Zou
- Abstract summary: We propose a systematic approach, conceptual explanation scores (CES)
CES explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts.
We also train new models with intentional and known spurious correlations, which CES successfully identifies from a single misclassified test sample.
- Score: 16.521189362225996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding and explaining the mistakes made by trained models is critical
to many machine learning objectives, such as improving robustness, addressing
concept drift, and mitigating biases. However, this is often an ad hoc process
that involves manually looking at the model's mistakes on many test samples and
guessing at the underlying reasons for those incorrect predictions. In this
paper, we propose a systematic approach, conceptual explanation scores (CES),
that explains why a classifier makes a mistake on a particular test sample(s)
in terms of human-understandable concepts (e.g. this zebra is misclassified as
a dog because of faint stripes). We base CES on two prior ideas: counterfactual
explanations and concept activation vectors, and validate our approach on
well-known pretrained models, showing that it explains the models' mistakes
meaningfully. We also train new models with intentional and known spurious
correlations, which CES successfully identifies from a single misclassified
test sample. The code for CES is publicly available and can easily be applied
to new models.
Related papers
- CLIMAX: An exploration of Classifier-Based Contrastive Explanations [5.381004207943597]
We propose a novel post-hoc model XAI technique that provides contrastive explanations justifying the classification of a black box.
Our method, which we refer to as CLIMAX, is based on local classifiers.
We show that we achieve better consistency as compared to baselines such as LIME, BayLIME, and SLIME.
arXiv Detail & Related papers (2023-07-02T22:52:58Z) - Explaining Reject Options of Learning Vector Quantization Classifiers [6.125017875330933]
We propose to use counterfactual explanations for explaining rejects in machine learning models.
We investigate how to efficiently compute counterfactual explanations of different reject options for an important class of models.
arXiv Detail & Related papers (2022-02-15T08:16:10Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Remembering for the Right Reasons: Explanations Reduce Catastrophic
Forgetting [100.75479161884935]
We propose a novel training paradigm called Remembering for the Right Reasons (RRR)
RRR stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions.
We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting.
arXiv Detail & Related papers (2020-10-04T10:05:27Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z) - Are Visual Explanations Useful? A Case Study in Model-in-the-Loop
Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task.
We find that presenting model predictions improves human accuracy.
However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.