Improving adversarial robustness of deep neural networks by using
semantic information
- URL: http://arxiv.org/abs/2008.07838v2
- Date: Thu, 17 Jun 2021 02:24:45 GMT
- Title: Improving adversarial robustness of deep neural networks by using
semantic information
- Authors: Lina Wang, Rui Tang, Yawei Yue, Xingshu Chen, Wei Wang, Yi Zhu, and
Xuemei Zeng
- Abstract summary: Adrial training is the main method for improving adversarial robustness and the first line of defense against adversarial attacks.
This paper provides a new perspective on the issue of adversarial robustness, one that shifts the focus from the network as a whole to the critical part of the region close to the decision boundary corresponding to a given class.
Experimental results on the MNIST and CIFAR-10 datasets show that this approach greatly improves adversarial robustness even using a very small dataset from the training data.
- Score: 17.887586209038968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The vulnerability of deep neural networks (DNNs) to adversarial attack, which
is an attack that can mislead state-of-the-art classifiers into making an
incorrect classification with high confidence by deliberately perturbing the
original inputs, raises concerns about the robustness of DNNs to such attacks.
Adversarial training, which is the main heuristic method for improving
adversarial robustness and the first line of defense against adversarial
attacks, requires many sample-by-sample calculations to increase training size
and is usually insufficiently strong for an entire network. This paper provides
a new perspective on the issue of adversarial robustness, one that shifts the
focus from the network as a whole to the critical part of the region close to
the decision boundary corresponding to a given class. From this perspective, we
propose a method to generate a single but image-agnostic adversarial
perturbation that carries the semantic information implying the directions to
the fragile parts on the decision boundary and causes inputs to be
misclassified as a specified target. We call the adversarial training based on
such perturbations "region adversarial training" (RAT), which resembles
classical adversarial training but is distinguished in that it reinforces the
semantic information missing in the relevant regions. Experimental results on
the MNIST and CIFAR-10 datasets show that this approach greatly improves
adversarial robustness even using a very small dataset from the training data;
moreover, it can defend against FGSM adversarial attacks that have a completely
different pattern from the model seen during retraining.
Related papers
- Adversarial Training Should Be Cast as a Non-Zero-Sum Game [121.95628660889628]
Two-player zero-sum paradigm of adversarial training has not engendered sufficient levels of robustness.
We show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on robustness.
A novel non-zero-sum bilevel formulation of adversarial training yields a framework that matches and in some cases outperforms state-of-the-art attacks.
arXiv Detail & Related papers (2023-06-19T16:00:48Z) - Improved Adversarial Training Through Adaptive Instance-wise Loss
Smoothing [5.1024659285813785]
Adversarial training has been the most successful defense against such adversarial attacks.
We propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training.
Our method achieves state-of-the-art robustness against $ell_infty$-norm constrained attacks.
arXiv Detail & Related papers (2023-03-24T15:41:40Z) - Improving Hyperspectral Adversarial Robustness Under Multiple Attacks [2.741266294612776]
We propose an Adversarial Discriminator Ensemble Network (ADE-Net) to combat this issue.
In the proposed method, a discriminator network is used to separate data by attack type into their specific attack-expert ensemble network.
arXiv Detail & Related papers (2022-10-28T18:21:45Z) - Strength-Adaptive Adversarial Training [103.28849734224235]
Adversarial training (AT) is proven to reliably improve network's robustness against adversarial data.
Current AT with a pre-specified perturbation budget has limitations in learning a robust network.
We propose emphStrength-Adaptive Adversarial Training (SAAT) to overcome these limitations.
arXiv Detail & Related papers (2022-10-04T00:22:37Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Robustness through Cognitive Dissociation Mitigation in Contrastive
Adversarial Training [2.538209532048867]
We introduce a novel neural network training framework that increases model's adversarial robustness to adversarial attacks.
We propose to improve model robustness to adversarial attacks by learning feature representations consistent under both data augmentations and adversarial perturbations.
We validate our method on the CIFAR-10 dataset on which it outperforms both robust accuracy and clean accuracy over alternative supervised and self-supervised adversarial learning methods.
arXiv Detail & Related papers (2022-03-16T21:41:27Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Semantics-Preserving Adversarial Training [12.242659601882147]
Adversarial training is a technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data.
We propose semantics-preserving adversarial training (SPAT) which encourages perturbation on the pixels that are shared among all classes.
Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2020-09-23T07:42:14Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.