Rethinking Robustness of Model Attributions
- URL: http://arxiv.org/abs/2312.10534v1
- Date: Sat, 16 Dec 2023 20:20:38 GMT
- Title: Rethinking Robustness of Model Attributions
- Authors: Sandesh Kamath, Sankalp Mittal, Amit Deshpande, Vineeth N
Balasubramanian
- Abstract summary: We show that many attribution methods are fragile and have proposed improvements in either these methods or the model training.
We observe two main causes for fragile attributions: first, the existing metrics of robustness over-penalize even reasonable local shifts in attribution.
We propose simple ways to strengthen existing metrics and attribution methods that incorporate locality of pixels in robustness metrics and diversity of pixel locations in attributions.
- Score: 24.317595434521504
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: For machine learning models to be reliable and trustworthy, their decisions
must be interpretable. As these models find increasing use in safety-critical
applications, it is important that not just the model predictions but also
their explanations (as feature attributions) be robust to small
human-imperceptible input perturbations. Recent works have shown that many
attribution methods are fragile and have proposed improvements in either these
methods or the model training. We observe two main causes for fragile
attributions: first, the existing metrics of robustness (e.g., top-k
intersection) over-penalize even reasonable local shifts in attribution,
thereby making random perturbations to appear as a strong attack, and second,
the attribution can be concentrated in a small region even when there are
multiple important parts in an image. To rectify this, we propose simple ways
to strengthen existing metrics and attribution methods that incorporate
locality of pixels in robustness metrics and diversity of pixel locations in
attributions. Towards the role of model training in attributional robustness,
we empirically observe that adversarially trained models have more robust
attributions on smaller datasets, however, this advantage disappears in larger
datasets. Code is available at https://github.com/ksandeshk/LENS.
Related papers
- Characterizing Data Point Vulnerability via Average-Case Robustness [29.881355412540557]
adversarial robustness is a standard framework, which views robustness of predictions through a binary lens.
We consider a complementary framework for robustness, called average-case robustness, which measures the fraction of points in a local region.
We show empirically that our estimators are accurate and efficient for standard deep learning models.
arXiv Detail & Related papers (2023-07-26T01:10:29Z) - Interpretable Computer Vision Models through Adversarial Training:
Unveiling the Robustness-Interpretability Connection [0.0]
Interpretability is as essential as robustness when we deploy the models to the real world.
Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans.
arXiv Detail & Related papers (2023-07-04T13:51:55Z) - On Evaluating the Adversarial Robustness of Semantic Segmentation Models [0.0]
A number of adversarial training approaches have been proposed as a defense against adversarial perturbation.
We show for the first time that a number of models in previous work that are claimed to be robust are in fact not robust at all.
We then evaluate simple adversarial training algorithms that produce reasonably robust models even under our set of strong attacks.
arXiv Detail & Related papers (2023-06-25T11:45:08Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Fairness Increases Adversarial Vulnerability [50.90773979394264]
This paper shows the existence of a dichotomy between fairness and robustness, and analyzes when achieving fairness decreases the model robustness to adversarial samples.
Experiments on non-linear models and different architectures validate the theoretical findings in multiple vision domains.
The paper proposes a simple, yet effective, solution to construct models achieving good tradeoffs between fairness and robustness.
arXiv Detail & Related papers (2022-11-21T19:55:35Z) - Defensive Patches for Robust Recognition in the Physical World [111.46724655123813]
Data-end defense improves robustness by operations on input data instead of modifying models.
Previous data-end defenses show low generalization against diverse noises and weak transferability across multiple models.
We propose a defensive patch generation framework to address these problems by helping models better exploit these features.
arXiv Detail & Related papers (2022-04-13T07:34:51Z) - Clustering Effect of (Linearized) Adversarial Robust Models [60.25668525218051]
We propose a novel understanding of adversarial robustness and apply it on more tasks including domain adaption and robustness boosting.
Experimental evaluations demonstrate the rationality and superiority of our proposed clustering strategy.
arXiv Detail & Related papers (2021-11-25T05:51:03Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - How to compare adversarial robustness of classifiers from a global
perspective [0.0]
Adversarial attacks undermine the reliability of and trust in machine learning models.
Point-wise measures for specific threat models are currently the most popular tool for comparing the robustness of classifiers.
In this work, we use recently proposed robustness curves to show that point-wise measures fail to capture important global properties.
arXiv Detail & Related papers (2020-04-22T22:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.