Jacobian Regularization for Mitigating Universal Adversarial
Perturbations
- URL: http://arxiv.org/abs/2104.10459v1
- Date: Wed, 21 Apr 2021 11:00:21 GMT
- Title: Jacobian Regularization for Mitigating Universal Adversarial
Perturbations
- Authors: Kenneth T. Co, David Martinez Rego, Emil C. Lupu
- Abstract summary: Universal Adversarial Perturbations (UAPs) are input perturbations that can fool a neural network on large sets of data.
We derive upper bounds for the effectiveness of UAPs based on norms of data-dependent Jacobians.
- Score: 2.9465623430708905
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Universal Adversarial Perturbations (UAPs) are input perturbations that can
fool a neural network on large sets of data. They are a class of attacks that
represents a significant threat as they facilitate realistic, practical, and
low-cost attacks on neural networks. In this work, we derive upper bounds for
the effectiveness of UAPs based on norms of data-dependent Jacobians. We
empirically verify that Jacobian regularization greatly increases model
robustness to UAPs by up to four times whilst maintaining clean performance.
Our theoretical analysis also allows us to formulate a metric for the strength
of shared adversarial perturbations between pairs of inputs. We apply this
metric to benchmark datasets and show that it is highly correlated with the
actual observed robustness. This suggests that realistic and practical
universal attacks can be reliably mitigated without sacrificing clean accuracy,
which shows promise for the robustness of machine learning systems.
Related papers
- Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)
To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Bridging Optimal Transport and Jacobian Regularization by Optimal
Trajectory for Enhanced Adversarial Defense [27.923344040692744]
We analyze the intricacies of adversarial training and Jacobian regularization, two pivotal defenses.
We propose our novel Optimal Transport with Jacobian regularization method, dubbed OTJR.
Our empirical evaluations set a new standard in the domain, with our method achieving commendable accuracies of 52.57% on CIFAR-10 and 28.3% on CIFAR-100 datasets.
arXiv Detail & Related papers (2023-03-21T12:22:59Z) - Out-of-Distribution Detection with Hilbert-Schmidt Independence
Optimization [114.43504951058796]
Outlier detection tasks have been playing a critical role in AI safety.
Deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence.
We propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks.
arXiv Detail & Related papers (2022-09-26T15:59:55Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z) - Adversarial Vulnerability of Randomized Ensembles [12.082239973914326]
We show that randomized ensembles are more vulnerable to imperceptible adversarial perturbations than even standard AT models.
We propose a theoretically-sound and efficient adversarial attack algorithm (ARC) capable of compromising random ensembles even in cases where adaptive PGD fails to do so.
arXiv Detail & Related papers (2022-06-14T10:37:58Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Jacobian Ensembles Improve Robustness Trade-offs to Adversarial Attacks [5.70772577110828]
We propose a novel approach, Jacobian Ensembles, to increase the robustness against UAPs.
Our results show that Jacobian Ensembles achieves previously unseen levels of accuracy and robustness.
arXiv Detail & Related papers (2022-04-19T08:04:38Z) - CC-Cert: A Probabilistic Approach to Certify General Robustness of
Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks.
It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations.
We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z) - Universal Adversarial Attack on Deep Learning Based Prognostics [0.0]
We present the concept of universal adversarial perturbation, a special imperceptible noise to fool regression based RUL prediction models.
We show that addition of universal adversarial perturbation to any instance of the input data increases error in the output predicted by the model.
We further demonstrate the effect of varying the strength of perturbations on RUL prediction models and found that model accuracy decreases with the increase in perturbation strength.
arXiv Detail & Related papers (2021-09-15T08:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.