Understanding Catastrophic Overfitting in Single-step Adversarial
Training
- URL: http://arxiv.org/abs/2010.01799v2
- Date: Tue, 15 Dec 2020 08:41:08 GMT
- Title: Understanding Catastrophic Overfitting in Single-step Adversarial
Training
- Authors: Hoki Kim, Woojin Lee, Jaewook Lee
- Abstract summary: "catastrophic overfitting" is a phenomenon in which the robust accuracy against projected gradient descent suddenly decreases to 0% after a few epochs.
We propose a simple method that not only prevents catastrophic overfitting, but also overrides the belief that it is difficult to prevent multi-step adversarial attacks with single-step adversarial training.
- Score: 9.560980936110234
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although fast adversarial training has demonstrated both robustness and
efficiency, the problem of "catastrophic overfitting" has been observed. This
is a phenomenon in which, during single-step adversarial training, the robust
accuracy against projected gradient descent (PGD) suddenly decreases to 0%
after a few epochs, whereas the robust accuracy against fast gradient sign
method (FGSM) increases to 100%. In this paper, we demonstrate that
catastrophic overfitting is very closely related to the characteristic of
single-step adversarial training which uses only adversarial examples with the
maximum perturbation, and not all adversarial examples in the adversarial
direction, which leads to decision boundary distortion and a highly curved loss
surface. Based on this observation, we propose a simple method that not only
prevents catastrophic overfitting, but also overrides the belief that it is
difficult to prevent multi-step adversarial attacks with single-step
adversarial training.
Related papers
- Understanding and Combating Robust Overfitting via Input Loss Landscape
Analysis and Regularization [5.1024659285813785]
Adrial training is prone to overfitting, and the cause is far from clear.
We find that robust overfitting results from standard training, specifically the minimization of the clean loss.
We propose a new regularizer to smooth the loss landscape by penalizing the weighted logits variation along the adversarial direction.
arXiv Detail & Related papers (2022-12-09T16:55:30Z) - Boundary Adversarial Examples Against Adversarial Overfitting [4.391102490444538]
adversarial training approaches suffer from robust overfitting where the robust accuracy decreases when models are adversarially trained for too long.
Several mitigation approaches including early stopping, temporal ensembling and weight memorizations have been proposed to mitigate the effect of robust overfitting.
In this paper, we investigate if these mitigation approaches are complimentary to each other in improving adversarial training performance.
arXiv Detail & Related papers (2022-11-25T13:16:53Z) - Fast Adversarial Training with Adaptive Step Size [62.37203478589929]
We study the phenomenon from the perspective of training instances.
We propose a simple but effective method, Adversarial Training with Adaptive Step size (ATAS)
ATAS learns an instancewise adaptive step size that is inversely proportional to its gradient norm.
arXiv Detail & Related papers (2022-06-06T08:20:07Z) - Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance.
Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z) - Enhancing Adversarial Robustness for Deep Metric Learning [77.75152218980605]
adversarial robustness of deep metric learning models has to be improved.
In order to avoid model collapse due to excessively hard examples, the existing defenses dismiss the min-max adversarial training.
We propose Hardness Manipulation to efficiently perturb the training triplet till a specified level of hardness for adversarial training.
arXiv Detail & Related papers (2022-03-02T22:27:44Z) - Multi-stage Optimization based Adversarial Training [16.295921205749934]
We propose a Multi-stage Optimization based Adversarial Training (MOAT) method that periodically trains the model on mixed benign examples.
Under similar amount of training overhead, the proposed MOAT exhibits better robustness than either single-step or multi-step adversarial training methods.
arXiv Detail & Related papers (2021-06-26T07:59:52Z) - Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise.
Pre-processing methods may suffer from the robustness degradation effect.
A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model.
We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z) - Robust Single-step Adversarial Training with Regularizer [11.35007968593652]
We propose a novel Fast Gradient Sign Method with PGD Regularization (FGSMPR) to boost the efficiency of adversarial training without catastrophic overfitting.
Experiments demonstrate that our proposed method can train a robust deep network for L$_infty$-perturbations with FGSM adversarial training.
arXiv Detail & Related papers (2021-02-05T19:07:10Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Towards Understanding Fast Adversarial Training [91.8060431517248]
We conduct experiments to understand the behavior of fast adversarial training.
We show the key to its success is the ability to recover from overfitting to weak attacks.
arXiv Detail & Related papers (2020-06-04T18:19:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.