Where do Models go Wrong? Parameter-Space Saliency Maps for
Explainability
- URL: http://arxiv.org/abs/2108.01335v1
- Date: Tue, 3 Aug 2021 07:32:34 GMT
- Title: Where do Models go Wrong? Parameter-Space Saliency Maps for
Explainability
- Authors: Roman Levin, Manli Shu, Eitan Borgnia, Furong Huang, Micah Goldblum,
Tom Goldstein
- Abstract summary: We take a different approach to saliency, in which we identify and analyze the network parameters, rather than inputs.
We find that samples which cause similar parameters to malfunction are semantically similar.
We also show that pruning the most salient parameters for a wrongly classified sample often improves model behavior.
- Score: 47.18202269163001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional saliency maps highlight input features to which neural network
predictions are highly sensitive. We take a different approach to saliency, in
which we identify and analyze the network parameters, rather than inputs, which
are responsible for erroneous decisions. We find that samples which cause
similar parameters to malfunction are semantically similar. We also show that
pruning the most salient parameters for a wrongly classified sample often
improves model behavior. Furthermore, fine-tuning a small number of the most
salient parameters on a single sample results in error correction on other
samples that are misclassified for similar reasons. Based on our parameter
saliency method, we also introduce an input-space saliency technique that
reveals how image features cause specific network components to malfunction.
Further, we rigorously validate the meaningfulness of our saliency maps on both
the dataset and case-study levels.
Related papers
- Quantifying lottery tickets under label noise: accuracy, calibration,
and complexity [6.232071870655069]
Pruning deep neural networks is a widely used strategy to alleviate the computational burden in machine learning.
We use the sparse double descent approach to identify univocally and characterise pruned models associated with classification tasks.
arXiv Detail & Related papers (2023-06-21T11:35:59Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Classification and Adversarial examples in an Overparameterized Linear
Model: A Signal Processing Perspective [10.515544361834241]
State-of-the-art deep learning classifiers are highly susceptible to infinitesmal adversarial perturbations.
We find that the learned model is susceptible to adversaries in an intermediate regime where classification generalizes but regression does not.
Despite the adversarial susceptibility, we find that classification with these features can be easier than the more commonly studied "independent feature" models.
arXiv Detail & Related papers (2021-09-27T17:35:42Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Investigating sanity checks for saliency maps with image and text
classification [1.836681984330549]
Saliency maps have shown to be both useful and misleading for explaining model predictions especially in the context of images.
We analyze the effects of the input multiplier in certain saliency maps using similarity scores, max-sensitivity and infidelity evaluation metrics.
arXiv Detail & Related papers (2021-06-08T23:23:42Z) - Toward Scalable and Unified Example-based Explanation and Outlier
Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction.
We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.