Exploiting the Relationship Between Kendall's Rank Correlation and
Cosine Similarity for Attribution Protection
- URL: http://arxiv.org/abs/2205.07279v1
- Date: Sun, 15 May 2022 13:08:50 GMT
- Title: Exploiting the Relationship Between Kendall's Rank Correlation and
Cosine Similarity for Attribution Protection
- Authors: Fan Wang, Adams Wai-Kin Kong
- Abstract summary: We first show that the expected Kendall's rank correlation is positively correlated to cosine similarity and then indicate that the direction of attribution is the key to attribution robustness.
Our analysis further exposes that IGR encourages neurons with the same activation states for natural samples and the corresponding perturbed samples, which is shown to induce robustness to gradient-based attribution methods.
- Score: 21.341303776931532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model attributions are important in deep neural networks as they aid
practitioners in understanding the models, but recent studies reveal that
attributions can be easily perturbed by adding imperceptible noise to the
input. The non-differentiable Kendall's rank correlation is a key performance
index for attribution protection. In this paper, we first show that the
expected Kendall's rank correlation is positively correlated to cosine
similarity and then indicate that the direction of attribution is the key to
attribution robustness. Based on these findings, we explore the vector space of
attribution to explain the shortcomings of attribution defense methods using
$\ell_p$ norm and propose integrated gradient regularizer (IGR), which
maximizes the cosine similarity between natural and perturbed attributions. Our
analysis further exposes that IGR encourages neurons with the same activation
states for natural samples and the corresponding perturbed samples, which is
shown to induce robustness to gradient-based attribution methods. Our
experiments on different models and datasets confirm our analysis on
attribution protection and demonstrate a decent improvement in adversarial
robustness.
Related papers
- Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation [26.544938760265136]
Deep neural classifiers rely on spurious correlations between spurious attributes of inputs and targets to make predictions.
We propose a self-guided spurious correlation mitigation framework.
We show that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori.
arXiv Detail & Related papers (2024-05-06T17:12:21Z) - Causal Discovery by Kernel Deviance Measures with Heterogeneous
Transforms [17.368146833023893]
We propose a novel score measure based on heterogeneous transformation of RKHS embeddings to extract relevant higher-order moments of the conditional densities for causal discovery.
Inference is made via comparing the score of each hypothetical cause-effect direction.
arXiv Detail & Related papers (2024-01-31T17:28:05Z) - Latent Feature Relation Consistency for Adversarial Robustness [80.24334635105829]
misclassification will occur when deep neural networks predict adversarial examples which add human-imperceptible adversarial noise to natural examples.
We propose textbfLatent textbfFeature textbfRelation textbfConsistency (textbfLFRC)
LFRC constrains the relation of adversarial examples in latent space to be consistent with the natural examples.
arXiv Detail & Related papers (2023-03-29T13:50:01Z) - A Practical Upper Bound for the Worst-Case Attribution Deviations [21.341303776931532]
Model attribution is a critical component of deep neural networks (DNNs) for its interpretability to complex models.
Recent studies bring up attention to the security of attribution methods as they are vulnerable to attribution attacks that generate similar images with dramatically different attributions.
Existing works have been investigating empirically improving the robustness of DNNs against those attacks; however, none of them explicitly quantifies the actual deviations of attributions.
In this work, for the first time, a constrained optimization problem is formulated to derive an upper bound that measures the largest dissimilarity of attributions after the samples are perturbed by any noises within a certain region
arXiv Detail & Related papers (2023-03-01T09:07:27Z) - Fairness via Adversarial Attribute Neighbourhood Robust Learning [49.93775302674591]
We propose a principled underlineRobust underlineAdversarial underlineAttribute underlineNeighbourhood (RAAN) loss to debias the classification head.
arXiv Detail & Related papers (2022-10-12T23:39:28Z) - Explicit Tradeoffs between Adversarial and Natural Distributional
Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability.
In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Adversarial Robustness through the Lens of Causality [105.51753064807014]
adversarial vulnerability of deep neural networks has attracted significant attention in machine learning.
We propose to incorporate causality into mitigating adversarial vulnerability.
Our method can be seen as the first attempt to leverage causality for mitigating adversarial vulnerability.
arXiv Detail & Related papers (2021-06-11T06:55:02Z) - Interpreting Deep Neural Networks with Relative Sectional Propagation by
Analyzing Comparative Gradients and Hostile Activations [37.11665902583138]
We propose a new attribution method, Relative Sectional Propagation (RSP), for decomposing the output predictions of Deep Neural Networks (DNNs)
We define hostile factor as an element that interferes with finding the attributions of the target and propagates it in a distinguishable way to overcome the non-suppressed nature of activated neurons.
Our method makes it possible to decompose the predictions of DNNs with clearer class-discriminativeness and detailed elucidations of activation neurons compared to the conventional attribution methods.
arXiv Detail & Related papers (2020-12-07T03:11:07Z) - Latent Causal Invariant Model [128.7508609492542]
Current supervised learning can learn spurious correlation during the data-fitting process.
We propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
arXiv Detail & Related papers (2020-11-04T10:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.