Related papers: Fidelity of Interpretability Methods and Perturbation Artifacts in Neural Networks

Fidelity of Interpretability Methods and Perturbation Artifacts in Neural Networks

URL: http://arxiv.org/abs/2203.02928v4
Date: Tue, 12 Sep 2023 15:00:10 GMT
Title: Fidelity of Interpretability Methods and Perturbation Artifacts in Neural Networks
Authors: Lennart Brocki, Neo Christopher Chung
Abstract summary: Post-hoc interpretability methods aim to quantify the importance of input features with respect to the class probabilities. A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy. We propose a method for estimating the impact of such artifacts on the fidelity estimation by utilizing model accuracy curves from perturbing input features.
Score: 5.439020425819001
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite excellent performance of deep neural networks (DNNs) in image classification, detection, and prediction, characterizing how DNNs make a given decision remains an open problem, resulting in a number of interpretability methods. Post-hoc interpretability methods primarily aim to quantify the importance of input features with respect to the class probabilities. However, due to the lack of ground truth and the existence of interpretability methods with diverse operating characteristics, evaluating these methods is a crucial challenge. A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy. However, perturbation itself may introduce artifacts. We propose a method for estimating the impact of such artifacts on the fidelity estimation by utilizing model accuracy curves from perturbing input features according to the Most Import First (MIF) and Least Import First (LIF) orders. Using the ResNet-50 trained on the ImageNet, we demonstrate the proposed fidelity estimation of four popular post-hoc interpretability methods.

Related papers

A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation. Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity. This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z)
Tractable Function-Space Variational Inference in Bayesian Neural Networks [72.97620734290139]
A popular approach for estimating the predictive uncertainty of neural networks is to define a prior distribution over the network parameters. We propose a scalable function-space variational inference method that allows incorporating prior information. We show that the proposed method leads to state-of-the-art uncertainty estimation and predictive performance on a range of prediction tasks.
arXiv Detail & Related papers (2023-12-28T18:33:26Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators in Neural Networks [5.439020425819001]
Post-hoc interpretability methods attempt to make the inner workings of deep neural networks more interpretable. One of the most popular evaluation frameworks is to perturb features deemed important by an interpretability method. We propose feature perturbation augmentation (FPA) which creates and adds perturbed images during the model training.
arXiv Detail & Related papers (2023-03-02T19:05:46Z)
Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks. Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z)
Don't Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis [6.15738282053772]
We introduce EVA -- the first explainability method guarantee to have an exhaustive exploration of a perturbation space. We leverage the beneficial properties of verified perturbation analysis to efficiently characterize the input variables that are most likely to drive the model decision.
arXiv Detail & Related papers (2022-02-15T21:13:55Z)
NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution. We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z)
Interpretable Social Anchors for Human Trajectory Forecasting in Crowds [84.20437268671733]
We propose a neural network-based system to predict human trajectory in crowds. We learn interpretable rule-based intents, and then utilise the expressibility of neural networks to model scene-specific residual. Our architecture is tested on the interaction-centric benchmark TrajNet++.
arXiv Detail & Related papers (2021-05-07T09:22:34Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
Reachable Sets of Classifiers and Regression Models: (Non-)Robustness Analysis and Robust Training [1.0878040851638]
We analyze and enhance robustness properties of both classifiers and regression models. Specifically, we verify (non-)robustness, propose a robust training procedure, and show that our approach outperforms adversarial attacks. Second, we provide techniques to distinguish between reliable and non-reliable predictions for unlabeled inputs, to quantify the influence of each feature on a prediction, and compute a feature ranking.
arXiv Detail & Related papers (2020-07-28T10:58:06Z)
A generalizable saliency map-based interpretation of model outcome [1.14219428942199]
We propose a non-intrusive interpretability technique that uses the input and output of the model to generate a saliency map. Experiments show that our interpretability method can reconstruct the salient part of the input with a classification accuracy of 89%.
arXiv Detail & Related papers (2020-06-16T20:34:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.