Related papers: Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models

Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models

URL: http://arxiv.org/abs/2210.11498v1
Date: Thu, 20 Oct 2022 18:02:07 GMT
Title: Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models
Authors: Hannah Chen, Yangfeng Ji, David Evans
Abstract summary: We show that standard adversarial training methods may make a model more vulnerable to fickle adversarial examples. We introduce Balanced Adversarial Training, which incorporates contrastive learning to increase robustness against both fickle and obstinate adversarial examples.
Score: 21.06607915149245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional (fickle) adversarial examples involve finding a small perturbation that does not change an input's true label but confuses the classifier into outputting a different prediction. Conversely, obstinate adversarial examples occur when an adversary finds a small perturbation that preserves the classifier's prediction but changes the true label of an input. Adversarial training and certified robust training have shown some effectiveness in improving the robustness of machine learnt models to fickle adversarial examples. We show that standard adversarial training methods focused on reducing vulnerability to fickle adversarial examples may make a model more vulnerable to obstinate adversarial examples, with experiments for both natural language inference and paraphrase identification tasks. To counter this phenomenon, we introduce Balanced Adversarial Training, which incorporates contrastive learning to increase robustness against both fickle and obstinate adversarial examples.

Related papers

On the Effect of Adversarial Training Against Invariance-based Adversarial Examples [0.23624125155742057]
This work addresses the impact of adversarial training with invariance-based adversarial examples on a convolutional neural network (CNN) We show that when adversarial training with invariance-based and perturbation-based adversarial examples is applied, it should be conducted simultaneously and not consecutively.
arXiv Detail & Related papers (2023-02-16T12:35:37Z)
The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart. Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z)
Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors. We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z)
Collaborative Adversarial Training [82.25340762659991]
We show that some collaborative examples, nearly perceptually indistinguishable from both adversarial and benign examples, can be utilized to enhance adversarial training. A novel method called collaborative adversarial training (CoAT) is thus proposed to achieve new state-of-the-arts.
arXiv Detail & Related papers (2022-05-23T09:41:41Z)
On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [72.95029777394186]
Adversarial training is a popular method to robustify models against adversarial attacks. We investigate this phenomenon from the perspective of training instances. We show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z)
Calibrated Adversarial Training [8.608288231153304]
We present the Calibrated Adversarial Training, a method that reduces the adverse effects of semantic perturbations in adversarial training. The method produces pixel-level adaptations to the perturbations based on novel calibrated robust error.
arXiv Detail & Related papers (2021-10-01T19:17:28Z)
CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding [35.003401250150034]
We propose Contrastive Learning with semantIc Negative Examples (CLINE) to improve robustness of pre-trained language models. CLINE constructs semantic negative examples unsupervised to improve the robustness under semantically adversarial attacking. Empirical results show that our approach yields substantial improvements on a range of sentiment analysis, reasoning, and reading comprehension tasks.
arXiv Detail & Related papers (2021-07-01T13:34:12Z)
Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness. We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
Semantics-Preserving Adversarial Training [12.242659601882147]
Adversarial training is a technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data. We propose semantics-preserving adversarial training (SPAT) which encourages perturbation on the pixels that are shared among all classes. Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2020-09-23T07:42:14Z)
Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification. This paper studies a complementary failure mode, invariance-based adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.