On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training
- URL: http://arxiv.org/abs/2112.07324v2
- Date: Tue, 17 Dec 2024 08:17:26 GMT
- Title: On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training
- Authors: Chen Liu, Zhichao Huang, Mathieu Salzmann, Tong Zhang, Sabine Süsstrunk,
- Abstract summary: Adversarial training is a popular method to robustify models against adversarial attacks.
In this work, we investigate this phenomenon from the perspective of training instances.
We show that the decay in generalization performance of adversarial training is a result of fitting hard adversarial instances.
- Score: 70.82725772926949
- License:
- Abstract: Adversarial training is a popular method to robustify models against adversarial attacks. However, it exhibits much more severe overfitting than training on clean inputs. In this work, we investigate this phenomenon from the perspective of training instances, i.e., training input-target pairs. Based on a quantitative metric measuring the relative difficulty of an instance in the training set, we analyze the model's behavior on training instances of different difficulty levels. This lets us demonstrate that the decay in generalization performance of adversarial training is a result of fitting hard adversarial instances. We theoretically verify our observations for both linear and general nonlinear models, proving that models trained on hard instances have worse generalization performance than ones trained on easy instances, and that this generalization gap increases with the size of the adversarial budget. Finally, we investigate solutions to mitigate adversarial overfitting in several scenarios, including fast adversarial training and fine-tuning a pretrained model with additional data. Our results demonstrate that using training data adaptively improves the model's robustness.
Related papers
- Fast Propagation is Better: Accelerating Single-Step Adversarial
Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples.
We propose to exploit the interior building blocks of the model to improve efficiency.
Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z) - A3T: Accuracy Aware Adversarial Training [22.42867682734154]
We identify one cause of overfitting related to current practices of generating adversarial samples from misclassified samples.
We show that our approach achieves better generalization while having comparable robustness to state-of-the-art adversarial training methods.
arXiv Detail & Related papers (2022-11-29T15:56:43Z) - Impact of Adversarial Training on Robustness and Generalizability of
Language Models [33.790145748360686]
This work provides an in depth comparison of different approaches for adversarial training in language models.
Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation.
A linguistic correlation analysis of neurons of the learned models reveals that the improved generalization is due to'more specialized' neurons.
arXiv Detail & Related papers (2022-11-10T12:36:50Z) - The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for
Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples.
We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart.
Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z) - Calibrated Adversarial Training [8.608288231153304]
We present the Calibrated Adversarial Training, a method that reduces the adverse effects of semantic perturbations in adversarial training.
The method produces pixel-level adaptations to the perturbations based on novel calibrated robust error.
arXiv Detail & Related papers (2021-10-01T19:17:28Z) - Imbalanced Adversarial Training with Reweighting [33.51820466479575]
We show that adversarially trained models can suffer much worse performance on under-represented classes, when the training dataset is imbalanced.
Traditional reweighting strategies may lose efficacy to deal with the imbalance issue for adversarial training.
We propose Separable Reweighted Adversarial Training (SRAT) to facilitate adversarial training under imbalanced scenarios.
arXiv Detail & Related papers (2021-07-28T20:51:36Z) - Multi-stage Optimization based Adversarial Training [16.295921205749934]
We propose a Multi-stage Optimization based Adversarial Training (MOAT) method that periodically trains the model on mixed benign examples.
Under similar amount of training overhead, the proposed MOAT exhibits better robustness than either single-step or multi-step adversarial training methods.
arXiv Detail & Related papers (2021-06-26T07:59:52Z) - Single-step Adversarial training with Dropout Scheduling [59.50324605982158]
We show that models trained using single-step adversarial training method learn to prevent the generation of single-step adversaries.
Models trained using proposed single-step adversarial training method are robust against both single-step and multi-step adversarial attacks.
arXiv Detail & Related papers (2020-04-18T14:14:00Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z) - Regularizers for Single-step Adversarial Training [49.65499307547198]
We propose three types of regularizers that help to learn robust models using single-step adversarial training methods.
Regularizers mitigate the effect of gradient masking by harnessing on properties that differentiate a robust model from that of a pseudo robust model.
arXiv Detail & Related papers (2020-02-03T09:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.