Shortcomings of Top-Down Randomization-Based Sanity Checks for
Evaluations of Deep Neural Network Explanations
- URL: http://arxiv.org/abs/2211.12486v1
- Date: Tue, 22 Nov 2022 18:52:38 GMT
- Title: Shortcomings of Top-Down Randomization-Based Sanity Checks for
Evaluations of Deep Neural Network Explanations
- Authors: Alexander Binder, Leander Weber, Sebastian Lapuschkin, Gr\'egoire
Montavon, Klaus-Robert M\"uller, Wojciech Samek
- Abstract summary: We identify limitations of model-randomization-based sanity checks for the purpose of evaluating explanations.
Top-down model randomization preserves scales of forward pass activations with high probability.
- Score: 67.40641255908443
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: While the evaluation of explanations is an important step towards trustworthy
models, it needs to be done carefully, and the employed metrics need to be
well-understood. Specifically model randomization testing is often
overestimated and regarded as a sole criterion for selecting or discarding
certain explanation methods. To address shortcomings of this test, we start by
observing an experimental gap in the ranking of explanation methods between
randomization-based sanity checks [1] and model output faithfulness measures
(e.g. [25]). We identify limitations of model-randomization-based sanity checks
for the purpose of evaluating explanations. Firstly, we show that uninformative
attribution maps created with zero pixel-wise covariance easily achieve high
scores in this type of checks. Secondly, we show that top-down model
randomization preserves scales of forward pass activations with high
probability. That is, channels with large activations have a high probility to
contribute strongly to the output, even after randomization of the network on
top of them. Hence, explanations after randomization can only be expected to
differ to a certain extent. This explains the observed experimental gap. In
summary, these results demonstrate the inadequacy of model-randomization-based
sanity checks as a criterion to rank attribution methods.
Related papers
- Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering [55.15192437680943]
Generative models lack rigorous statistical guarantees for their outputs.
We propose a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee.
This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example.
arXiv Detail & Related papers (2024-10-02T15:26:52Z) - Deep Evidential Learning for Bayesian Quantile Regression [3.6294895527930504]
It is desirable to have accurate uncertainty estimation from a single deterministic forward-pass model.
This paper proposes a deep Bayesian quantile regression model that can estimate the quantiles of a continuous target distribution without the Gaussian assumption.
arXiv Detail & Related papers (2023-08-21T11:42:16Z) - Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data.
We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z) - Boost Test-Time Performance with Closed-Loop Inference [85.43516360332646]
We propose to predict hard-classified test samples in a looped manner to boost the model performance.
We first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops.
For each hard sample, we construct an additional auxiliary learning task based on its original top-$K$ predictions to calibrate the model.
arXiv Detail & Related papers (2022-03-21T10:20:21Z) - Model-agnostic out-of-distribution detection using combined statistical
tests [15.27980070479021]
We present simple methods for out-of-distribution detection using a trained generative model.
We combine a classical parametric test (Rao's score test) with the recently introduced typicality test.
Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms.
arXiv Detail & Related papers (2022-03-02T13:32:09Z) - The Hidden Uncertainty in a Neural Networks Activations [105.4223982696279]
The distribution of a neural network's latent representations has been successfully used to detect out-of-distribution (OOD) data.
This work investigates whether this distribution correlates with a model's epistemic uncertainty, thus indicating its ability to generalise to novel inputs.
arXiv Detail & Related papers (2020-12-05T17:30:35Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.