Calibrated Adversarial Training
- URL: http://arxiv.org/abs/2110.00623v1
- Date: Fri, 1 Oct 2021 19:17:28 GMT
- Title: Calibrated Adversarial Training
- Authors: Tianjin Huang, Vlado Menkovski, Yulong Pei and Mykola Pechenizkiy
- Abstract summary: We present the Calibrated Adversarial Training, a method that reduces the adverse effects of semantic perturbations in adversarial training.
The method produces pixel-level adaptations to the perturbations based on novel calibrated robust error.
- Score: 8.608288231153304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training is an approach of increasing the robustness of models to
adversarial attacks by including adversarial examples in the training set. One
major challenge of producing adversarial examples is to contain sufficient
perturbation in the example to flip the model's output while not making severe
changes in the example's semantical content. Exuberant change in the semantical
content could also change the true label of the example. Adding such examples
to the training set results in adverse effects. In this paper, we present the
Calibrated Adversarial Training, a method that reduces the adverse effects of
semantic perturbations in adversarial training. The method produces pixel-level
adaptations to the perturbations based on novel calibrated robust error. We
provide theoretical analysis on the calibrated robust error and derive an upper
bound for it. Our empirical results show a superior performance of the
Calibrated Adversarial Training over a number of public datasets.
Related papers
- Vulnerability-Aware Instance Reweighting For Adversarial Training [4.874780144224057]
Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks.
AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify.
Various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set.
In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks.
arXiv Detail & Related papers (2023-07-14T05:31:32Z) - The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for
Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples.
We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart.
Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z) - Balanced Adversarial Training: Balancing Tradeoffs between Fickleness
and Obstinacy in NLP Models [21.06607915149245]
We show that standard adversarial training methods may make a model more vulnerable to fickle adversarial examples.
We introduce Balanced Adversarial Training, which incorporates contrastive learning to increase robustness against both fickle and obstinate adversarial examples.
arXiv Detail & Related papers (2022-10-20T18:02:07Z) - Adversarial Examples for Good: Adversarial Examples Guided Imbalanced
Learning [15.370413523189749]
We provide a new perspective on how to deal with imbalanced data: adjust the biased decision boundary by training with Guiding Adversarial Examples (GAEs)
Our method can effectively increase the accuracy of minority classes while sacrificing little accuracy on majority classes.
arXiv Detail & Related papers (2022-01-28T09:13:07Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - On the Impact of Hard Adversarial Instances on Overfitting in
Adversarial Training [72.95029777394186]
Adversarial training is a popular method to robustify models against adversarial attacks.
We investigate this phenomenon from the perspective of training instances.
We show that the decay in generalization performance of adversarial training is a result of the model's attempt to fit hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z) - Efficient Estimation of Influence of a Training Instance [56.29080605123304]
We propose an efficient method for estimating the influence of a training instance on a neural network model.
Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance.
We demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.
arXiv Detail & Related papers (2020-12-08T04:31:38Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Semantics-Preserving Adversarial Training [12.242659601882147]
Adversarial training is a technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data.
We propose semantics-preserving adversarial training (SPAT) which encourages perturbation on the pixels that are shared among all classes.
Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2020-09-23T07:42:14Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.