On the Effect of Adversarial Training Against Invariance-based
Adversarial Examples
- URL: http://arxiv.org/abs/2302.08257v1
- Date: Thu, 16 Feb 2023 12:35:37 GMT
- Title: On the Effect of Adversarial Training Against Invariance-based
Adversarial Examples
- Authors: Roland Rauter, Martin Nocker, Florian Merkle, Pascal Sch\"ottle
- Abstract summary: This work addresses the impact of adversarial training with invariance-based adversarial examples on a convolutional neural network (CNN)
We show that when adversarial training with invariance-based and perturbation-based adversarial examples is applied, it should be conducted simultaneously and not consecutively.
- Score: 0.23624125155742057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples are carefully crafted attack points that are supposed to
fool machine learning classifiers. In the last years, the field of adversarial
machine learning, especially the study of perturbation-based adversarial
examples, in which a perturbation that is not perceptible for humans is added
to the images, has been studied extensively. Adversarial training can be used
to achieve robustness against such inputs. Another type of adversarial examples
are invariance-based adversarial examples, where the images are semantically
modified such that the predicted class of the model does not change, but the
class that is determined by humans does. How to ensure robustness against this
type of adversarial examples has not been explored yet. This work addresses the
impact of adversarial training with invariance-based adversarial examples on a
convolutional neural network (CNN).
We show that when adversarial training with invariance-based and
perturbation-based adversarial examples is applied, it should be conducted
simultaneously and not consecutively. This procedure can achieve relatively
high robustness against both types of adversarial examples. Additionally, we
find that the algorithm used for generating invariance-based adversarial
examples in prior work does not correctly determine the labels and therefore we
use human-determined labels.
Related papers
- The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for
Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples.
We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart.
Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z) - Balanced Adversarial Training: Balancing Tradeoffs between Fickleness
and Obstinacy in NLP Models [21.06607915149245]
We show that standard adversarial training methods may make a model more vulnerable to fickle adversarial examples.
We introduce Balanced Adversarial Training, which incorporates contrastive learning to increase robustness against both fickle and obstinate adversarial examples.
arXiv Detail & Related papers (2022-10-20T18:02:07Z) - Unrestricted Adversarial Samples Based on Non-semantic Feature Clusters
Substitution [1.8782750537161608]
We introduce "unrestricted" perturbations that create adversarial samples by using spurious relations learned by model training.
Specifically, we find feature clusters in non-semantic features that are strongly correlated with model judgment results.
We create adversarial samples by using them to replace the corresponding feature clusters in the target image.
arXiv Detail & Related papers (2022-08-31T07:42:36Z) - Identification of Attack-Specific Signatures in Adversarial Examples [62.17639067715379]
We show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims.
Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims.
arXiv Detail & Related papers (2021-10-13T15:40:48Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Semantics-Preserving Adversarial Training [12.242659601882147]
Adversarial training is a technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data.
We propose semantics-preserving adversarial training (SPAT) which encourages perturbation on the pixels that are shared among all classes.
Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2020-09-23T07:42:14Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.