Perceptual Adversarial Robustness: Defense Against Unseen Threat Models
- URL: http://arxiv.org/abs/2006.12655v4
- Date: Sun, 4 Jul 2021 19:34:05 GMT
- Title: Perceptual Adversarial Robustness: Defense Against Unseen Threat Models
- Authors: Cassidy Laidlaw and Sahil Singla and Soheil Feizi
- Abstract summary: A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception.
Under the neural perceptual threat model, we develop novel perceptual adversarial attacks and defenses.
Because the NPTM is very broad, we find that Perceptual Adrial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks.
- Score: 58.47179090632039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge in adversarial robustness is the lack of a precise
mathematical characterization of human perception, used in the very definition
of adversarial attacks that are imperceptible to human eyes. Most current
attacks and defenses try to avoid this issue by considering restrictive
adversarial threat models such as those bounded by $L_2$ or $L_\infty$
distance, spatial perturbations, etc. However, models that are robust against
any of these restrictive threat models are still fragile against other threat
models. To resolve this issue, we propose adversarial training against the set
of all imperceptible adversarial examples, approximated using deep neural
networks. We call this threat model the neural perceptual threat model (NPTM);
it includes adversarial examples with a bounded neural perceptual distance (a
neural network-based approximation of the true perceptual distance) to natural
images. Through an extensive perceptual study, we show that the neural
perceptual distance correlates well with human judgements of perceptibility of
adversarial examples, validating our threat model.
Under the NPTM, we develop novel perceptual adversarial attacks and defenses.
Because the NPTM is very broad, we find that Perceptual Adversarial Training
(PAT) against a perceptual attack gives robustness against many other types of
adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against five
diverse adversarial attacks. We find that PAT achieves state-of-the-art
robustness against the union of these five attacks, more than doubling the
accuracy over the next best model, without training against any of them. That
is, PAT generalizes well to unforeseen perturbation types. This is vital in
sensitive applications where a particular threat model cannot be assumed, and
to the best of our knowledge, PAT is the first adversarial training defense
with this property.
Related papers
- Towards Unified Robustness Against Both Backdoor and Adversarial Attacks [31.846262387360767]
Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks.
This paper reveals that there is an intriguing connection between backdoor and adversarial attacks.
A novel Progressive Unified Defense algorithm is proposed to defend against backdoor and adversarial attacks simultaneously.
arXiv Detail & Related papers (2024-05-28T07:50:00Z) - F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of
Natural and Perturbed Patterns [74.03108122774098]
Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by well-designed perturbations.
This could lead to disastrous results on critical applications such as self-driving cars, surveillance security, and medical diagnosis.
We propose a Feature-Focusing Adversarial Training (F$2$AT) which enforces the model to focus on the core features from natural patterns.
arXiv Detail & Related papers (2023-10-23T04:31:42Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - TnT Attacks! Universal Naturalistic Adversarial Patches Against Deep
Neural Network Systems [15.982408142401072]
Deep neural networks are vulnerable to attacks from adversarial inputs and, more recently, Trojans to misguide or hijack the decision of the model.
A TnT is universal because any input image captured with a TnT in the scene will: i) misguide a network (untargeted attack); or ii) force the network to make a malicious decision.
We show a generalization of the attack to create patches achieving higher attack success rates than existing state-of-the-art methods.
arXiv Detail & Related papers (2021-11-19T01:35:10Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Towards Adversarial Patch Analysis and Certified Defense against Crowd
Counting [61.99564267735242]
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
Recent studies have demonstrated that deep neural network (DNN) methods are vulnerable to adversarial attacks.
We propose a robust attack strategy called Adversarial Patch Attack with Momentum to evaluate the robustness of crowd counting models.
arXiv Detail & Related papers (2021-04-22T05:10:55Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Adversarial Feature Desensitization [12.401175943131268]
We propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field.
Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs.
arXiv Detail & Related papers (2020-06-08T14:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.