Fidelity of Interpretability Methods and Perturbation Artifacts in
Neural Networks
- URL: http://arxiv.org/abs/2203.02928v4
- Date: Tue, 12 Sep 2023 15:00:10 GMT
- Title: Fidelity of Interpretability Methods and Perturbation Artifacts in
Neural Networks
- Authors: Lennart Brocki, Neo Christopher Chung
- Abstract summary: Post-hoc interpretability methods aim to quantify the importance of input features with respect to the class probabilities.
A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy.
We propose a method for estimating the impact of such artifacts on the fidelity estimation by utilizing model accuracy curves from perturbing input features.
- Score: 5.439020425819001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite excellent performance of deep neural networks (DNNs) in image
classification, detection, and prediction, characterizing how DNNs make a given
decision remains an open problem, resulting in a number of interpretability
methods. Post-hoc interpretability methods primarily aim to quantify the
importance of input features with respect to the class probabilities. However,
due to the lack of ground truth and the existence of interpretability methods
with diverse operating characteristics, evaluating these methods is a crucial
challenge. A popular approach to evaluate interpretability methods is to
perturb input features deemed important for a given prediction and observe the
decrease in accuracy. However, perturbation itself may introduce artifacts. We
propose a method for estimating the impact of such artifacts on the fidelity
estimation by utilizing model accuracy curves from perturbing input features
according to the Most Import First (MIF) and Least Import First (LIF) orders.
Using the ResNet-50 trained on the ImageNet, we demonstrate the proposed
fidelity estimation of four popular post-hoc interpretability methods.
Related papers
- Tractable Function-Space Variational Inference in Bayesian Neural
Networks [72.97620734290139]
A popular approach for estimating the predictive uncertainty of neural networks is to define a prior distribution over the network parameters.
We propose a scalable function-space variational inference method that allows incorporating prior information.
We show that the proposed method leads to state-of-the-art uncertainty estimation and predictive performance on a range of prediction tasks.
arXiv Detail & Related papers (2023-12-28T18:33:26Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Feature Perturbation Augmentation for Reliable Evaluation of Importance
Estimators in Neural Networks [5.439020425819001]
Post-hoc interpretability methods attempt to make the inner workings of deep neural networks more interpretable.
One of the most popular evaluation frameworks is to perturb features deemed important by an interpretability method.
We propose feature perturbation augmentation (FPA) which creates and adds perturbed images during the model training.
arXiv Detail & Related papers (2023-03-02T19:05:46Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - Don't Lie to Me! Robust and Efficient Explainability with Verified
Perturbation Analysis [6.15738282053772]
We introduce EVA -- the first explainability method guarantee to have an exhaustive exploration of a perturbation space.
We leverage the beneficial properties of verified perturbation analysis to efficiently characterize the input variables that are most likely to drive the model decision.
arXiv Detail & Related papers (2022-02-15T21:13:55Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Interpretable Social Anchors for Human Trajectory Forecasting in Crowds [84.20437268671733]
We propose a neural network-based system to predict human trajectory in crowds.
We learn interpretable rule-based intents, and then utilise the expressibility of neural networks to model scene-specific residual.
Our architecture is tested on the interaction-centric benchmark TrajNet++.
arXiv Detail & Related papers (2021-05-07T09:22:34Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Reachable Sets of Classifiers and Regression Models: (Non-)Robustness
Analysis and Robust Training [1.0878040851638]
We analyze and enhance robustness properties of both classifiers and regression models.
Specifically, we verify (non-)robustness, propose a robust training procedure, and show that our approach outperforms adversarial attacks.
Second, we provide techniques to distinguish between reliable and non-reliable predictions for unlabeled inputs, to quantify the influence of each feature on a prediction, and compute a feature ranking.
arXiv Detail & Related papers (2020-07-28T10:58:06Z) - A generalizable saliency map-based interpretation of model outcome [1.14219428942199]
We propose a non-intrusive interpretability technique that uses the input and output of the model to generate a saliency map.
Experiments show that our interpretability method can reconstruct the salient part of the input with a classification accuracy of 89%.
arXiv Detail & Related papers (2020-06-16T20:34:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.