Certifiably Robust Interpretation via Renyi Differential Privacy
- URL: http://arxiv.org/abs/2107.01561v1
- Date: Sun, 4 Jul 2021 06:58:01 GMT
- Title: Certifiably Robust Interpretation via Renyi Differential Privacy
- Authors: Ao Liu, Xiaoyu Chen, Sijia Liu, Lirong Xia, Chuang Gan
- Abstract summary: We study the problem of interpretation robustness from a new perspective of Renyi differential privacy (RDP)
First, it can offer provable and certifiable top-$k$ robustness.
Second, our proposed method offers $sim10%$ better experimental robustness than existing approaches.
Third, our method can provide a smooth tradeoff between robustness and computational efficiency.
- Score: 77.04377192920741
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the recent discovery that the interpretation maps of CNNs could
easily be manipulated by adversarial attacks against network interpretability,
we study the problem of interpretation robustness from a new perspective of
\Renyi differential privacy (RDP). The advantages of our Renyi-Robust-Smooth
(RDP-based interpretation method) are three-folds. First, it can offer provable
and certifiable top-$k$ robustness. That is, the top-$k$ important attributions
of the interpretation map are provably robust under any input perturbation with
bounded $\ell_d$-norm (for any $d\geq 1$, including $d = \infty$). Second, our
proposed method offers $\sim10\%$ better experimental robustness than existing
approaches in terms of the top-$k$ attributions. Remarkably, the accuracy of
Renyi-Robust-Smooth also outperforms existing approaches. Third, our method can
provide a smooth tradeoff between robustness and computational efficiency.
Experimentally, its top-$k$ attributions are {\em twice} more robust than
existing approaches when the computational resources are highly constrained.
Related papers
- SPLITZ: Certifiable Robustness via Split Lipschitz Randomized Smoothing [8.471466670802817]
There are two approaches to provide certifiable robustness to adversarial examples.
We propose textitSPLITZ, a practical and novel approach.
We show that textitSPLITZ consistently improves upon existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-03T05:13:28Z) - Adaptive Smoothness-weighted Adversarial Training for Multiple
Perturbations with Its Stability Analysis [39.90487314421744]
Adrial Training (AT) has been demonstrated as one of the most effective methods against adversarial examples.
Adrial training for multiple perturbations (ATMP) is proposed to generalize the adversarial robustness over different perturbation types.
We develop the stability-based excess risk bounds and propose adaptive-weighted adversarial training for multiple perturbations.
arXiv Detail & Related papers (2022-10-02T15:42:34Z) - There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning
for Mazes [64.05903267230467]
Interpretability is an essential building block for trustworthiness in reinforcement learning systems.
We show that in certain cases, one can achieve policy interpretability while maintaining its optimality.
arXiv Detail & Related papers (2022-06-09T04:23:26Z) - Adversarial Robustness Guarantees for Gaussian Processes [22.403365399119107]
Gaussian processes (GPs) enable principled computation of model uncertainty, making them attractive for safety-critical applications.
We present a framework to analyse adversarial robustness of GPs, defined as invariance of the model's decision to bounded perturbations.
We develop a branch-and-bound scheme to refine the bounds and show, for any $epsilon > 0$, that our algorithm is guaranteed to converge to values $epsilon$-close to the actual values in finitely many iterations.
arXiv Detail & Related papers (2021-04-07T15:14:56Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Almost Tight L0-norm Certified Robustness of Top-k Predictions against
Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches.
Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input.
For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z) - Differentiable Linear Bandit Algorithm [6.849358422233866]
Upper Confidence Bound is arguably the most commonly used method for linear multi-arm bandit problems.
We introduce a gradient estimator, which allows the confidence bound to be learned via gradient ascent.
We show that the proposed algorithm achieves a $tildemathcalO(hatbetasqrtdT)$ upper bound of $T$-round regret, where $d$ is the dimension of arm features and $hatbeta$ is the learned size of confidence bound.
arXiv Detail & Related papers (2020-06-04T16:43:55Z) - Toward Adversarial Robustness via Semi-supervised Robust Training [93.36310070269643]
Adrial examples have been shown to be the severe threat to deep neural networks (DNNs)
We propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_stand$ and $R_rob$)
arXiv Detail & Related papers (2020-03-16T02:14:08Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.