Rethinking interpretation: Input-agnostic saliency mapping of deep
visual classifiers
- URL: http://arxiv.org/abs/2303.17836v1
- Date: Fri, 31 Mar 2023 06:58:45 GMT
- Title: Rethinking interpretation: Input-agnostic saliency mapping of deep
visual classifiers
- Authors: Naveed Akhtar, Mohammad A. A. K. Jalwana
- Abstract summary: Saliency methods provide post-hoc model interpretation by attributing input features to the model outputs.
We show that input-specific saliency mapping is intrinsically susceptible to misleading feature attribution.
We introduce a new perspective of input-agnostic saliency mapping that computationally estimates the high-level features attributed by the model to its outputs.
- Score: 28.28834523468462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Saliency methods provide post-hoc model interpretation by attributing input
features to the model outputs. Current methods mainly achieve this using a
single input sample, thereby failing to answer input-independent inquiries
about the model. We also show that input-specific saliency mapping is
intrinsically susceptible to misleading feature attribution. Current attempts
to use 'general' input features for model interpretation assume access to a
dataset containing those features, which biases the interpretation. Addressing
the gap, we introduce a new perspective of input-agnostic saliency mapping that
computationally estimates the high-level features attributed by the model to
its outputs. These features are geometrically correlated, and are computed by
accumulating model's gradient information with respect to an unrestricted data
distribution. To compute these features, we nudge independent data points over
the model loss surface towards the local minima associated by a
human-understandable concept, e.g., class label for classifiers. With a
systematic projection, scaling and refinement process, this information is
transformed into an interpretable visualization without compromising its
model-fidelity. The visualization serves as a stand-alone qualitative
interpretation. With an extensive evaluation, we not only demonstrate
successful visualizations for a variety of concepts for large-scale models, but
also showcase an interesting utility of this new form of saliency mapping by
identifying backdoor signatures in compromised classifiers.
Related papers
- Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification [5.087579454836169]
State-of-the-art explainability methods generate saliency maps to show where a specific class is identified.
We introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network.
We also show an approach to generate global explanations by aggregating labels across multiple images.
arXiv Detail & Related papers (2024-05-06T09:21:35Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data.
Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Shared Interest: Large-Scale Visual Analysis of Model Behavior by
Measuring Human-AI Alignment [15.993648423884466]
Saliency is a technique to identify the importance of input features on a model's output.
We present Shared Interest: a set of metrics for comparing saliency with human annotated ground truths.
We show how Shared Interest can be used to rapidly develop or lose trust in a model's reliability.
arXiv Detail & Related papers (2021-07-20T02:44:39Z) - Building Reliable Explanations of Unreliable Neural Networks: Locally
Smoothing Perspective of Model Interpretation [0.0]
We present a novel method for reliably explaining the predictions of neural networks.
Our method is built on top of the assumption of smooth landscape in a loss function of the model prediction.
arXiv Detail & Related papers (2021-03-26T08:52:11Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Adversarial Infidelity Learning for Model Interpretation [43.37354056251584]
We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation.
Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission.
Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
arXiv Detail & Related papers (2020-06-09T16:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.