Debugging Tests for Model Explanations
- URL: http://arxiv.org/abs/2011.05429v1
- Date: Tue, 10 Nov 2020 22:23:25 GMT
- Title: Debugging Tests for Model Explanations
- Authors: Julius Adebayo, Michael Muelly, Ilaria Liccardi, Been Kim
- Abstract summary: Methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples.
We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions.
- Score: 18.073554618753395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate whether post-hoc model explanations are effective for
diagnosing model errors--model debugging. In response to the challenge of
explaining a model's prediction, a vast array of explanation methods have been
proposed. Despite increasing use, it is unclear if they are effective. To
start, we categorize \textit{bugs}, based on their source, into:~\textit{data,
model, and test-time} contamination bugs. For several explanation methods, we
assess their ability to: detect spurious correlation artifacts (data
contamination), diagnose mislabeled training examples (data contamination),
differentiate between a (partially) re-initialized model and a trained one
(model contamination), and detect out-of-distribution inputs (test-time
contamination). We find that the methods tested are able to diagnose a spurious
background bug, but not conclusively identify mislabeled training examples. In
addition, a class of methods, that modify the back-propagation algorithm are
invariant to the higher layer parameters of a deep network; hence, ineffective
for diagnosing model contamination. We complement our analysis with a human
subject study, and find that subjects fail to identify defective models using
attributions, but instead rely, primarily, on model predictions. Taken
together, our results provide guidance for practitioners and researchers
turning to explanations as tools for model debugging.
Related papers
- Demystifying amortized causal discovery with transformers [21.058343547918053]
Supervised learning approaches for causal discovery from observational data often achieve competitive performance.
In this work, we investigate CSIvA, a transformer-based model promising to train on synthetic data and transfer to real data.
We bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations.
arXiv Detail & Related papers (2024-05-27T08:17:49Z) - Revealing Model Biases: Assessing Deep Neural Networks via Recovered
Sample Analysis [9.05607520128194]
This paper proposes a straightforward and cost-effective approach to assess whether a deep neural network (DNN) relies on the primary concepts of training samples.
The proposed method does not require any test or generalization samples, only the parameters of the trained model and the training data that lie on the margin.
arXiv Detail & Related papers (2023-06-10T11:20:04Z) - Sanity Checks for Saliency Methods Explaining Object Detectors [5.735035463793008]
Saliency methods are frequently used to explain Deep Neural Network-based models.
We perform sanity checks for object detection and define new qualitative criteria to evaluate the saliency explanations.
We find that EfficientDet-D0 is the most interpretable method independent of the saliency method.
arXiv Detail & Related papers (2023-06-04T17:57:51Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Defuse: Harnessing Unrestricted Adversarial Examples for Debugging
Models Beyond Test Accuracy [11.265020351747916]
Defuse is a method to automatically discover and correct model errors beyond those available in test data.
We propose an algorithm inspired by adversarial machine learning techniques that uses a generative model to find naturally occurring instances misclassified by a model.
Defuse corrects the error after fine-tuning while maintaining generalization on the test set.
arXiv Detail & Related papers (2021-02-11T18:08:42Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.