Certified $\ell_2$ Attribution Robustness via Uniformly Smoothed Attributions
- URL: http://arxiv.org/abs/2405.06361v1
- Date: Fri, 10 May 2024 09:56:02 GMT
- Title: Certified $\ell_2$ Attribution Robustness via Uniformly Smoothed Attributions
- Authors: Fan Wang, Adams Wai-Kin Kong,
- Abstract summary: We propose a uniform smoothing technique that augments the vanilla attributions by noises uniformly sampled from a certain space.
It is proved that, for all perturbations within the attack region, the cosine similarity between uniformly smoothed attribution of perturbed sample and the unperturbed sample is guaranteed to be lower bounded.
- Score: 20.487079380753876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model attribution is a popular tool to explain the rationales behind model predictions. However, recent work suggests that the attributions are vulnerable to minute perturbations, which can be added to input samples to fool the attributions while maintaining the prediction outputs. Although empirical studies have shown positive performance via adversarial training, an effective certified defense method is eminently needed to understand the robustness of attributions. In this work, we propose to use uniform smoothing technique that augments the vanilla attributions by noises uniformly sampled from a certain space. It is proved that, for all perturbations within the attack region, the cosine similarity between uniformly smoothed attribution of perturbed sample and the unperturbed sample is guaranteed to be lower bounded. We also derive alternative formulations of the certification that is equivalent to the original one and provides the maximum size of perturbation or the minimum smoothing radius such that the attribution can not be perturbed. We evaluate the proposed method on three datasets and show that the proposed method can effectively protect the attributions from attacks, regardless of the architecture of networks, training schemes and the size of the datasets.
Related papers
- Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.
diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.
We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Carefully Blending Adversarial Training, Purification, and Aggregation Improves Adversarial Robustness [1.2289361708127877]
CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for defences.
Our method improves by a significant margin the state-of-the-art for Cifar-10, Cifar-100, and TinyImageNet-200.
arXiv Detail & Related papers (2023-05-25T09:04:31Z) - Confidence-aware Training of Smoothed Classifiers for Certified
Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input.
Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z) - Pixel is All You Need: Adversarial Trajectory-Ensemble Active Learning
for Salient Object Detection [40.97103355628434]
It is unclear whether a saliency model trained with weakly-supervised data can achieve the equivalent performance of its fully-supervised version.
We propose a novel yet effective adversarial trajectory-ensemble active learning (ATAL)
Experimental results show that our ATAL can find such a point-labeled dataset, where a saliency model trained on it obtained $97%$ -- $99%$ performance of its fully-supervised version with only ten annotated points per image.
arXiv Detail & Related papers (2022-12-13T11:18:08Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Fair mapping [0.0]
We propose a novel pre-processing method based on the transformation of the distribution of protected groups onto a chosen target one.
We leverage on the recent works of the Wasserstein GAN and AttGAN frameworks to achieve the optimal transport of data points.
Our proposed approach, preserves the interpretability of data and can be used without defining exactly the sensitive groups.
arXiv Detail & Related papers (2022-09-01T17:31:27Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Adversarial Robustness of Supervised Sparse Coding [34.94566482399662]
We consider a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate.
We focus on the hypothesis class obtained by combining a sparsity-promoting encoder coupled with a linear encoder.
We provide a robustness certificate for end-to-end classification.
arXiv Detail & Related papers (2020-10-22T22:05:21Z) - Regularized Training and Tight Certification for Randomized Smoothed
Classifier with Provable Robustness [15.38718018477333]
We derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart.
We also design a new certification algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability.
arXiv Detail & Related papers (2020-02-17T20:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.