Related papers: Understanding Catastrophic Overfitting in Single-step Adversarial Training

Understanding Catastrophic Overfitting in Single-step Adversarial Training

URL: http://arxiv.org/abs/2010.01799v2
Date: Tue, 15 Dec 2020 08:41:08 GMT
Title: Understanding Catastrophic Overfitting in Single-step Adversarial Training
Authors: Hoki Kim, Woojin Lee, Jaewook Lee
Abstract summary: "catastrophic overfitting" is a phenomenon in which the robust accuracy against projected gradient descent suddenly decreases to 0% after a few epochs. We propose a simple method that not only prevents catastrophic overfitting, but also overrides the belief that it is difficult to prevent multi-step adversarial attacks with single-step adversarial training.
Score: 9.560980936110234
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Although fast adversarial training has demonstrated both robustness and efficiency, the problem of "catastrophic overfitting" has been observed. This is a phenomenon in which, during single-step adversarial training, the robust accuracy against projected gradient descent (PGD) suddenly decreases to 0% after a few epochs, whereas the robust accuracy against fast gradient sign method (FGSM) increases to 100%. In this paper, we demonstrate that catastrophic overfitting is very closely related to the characteristic of single-step adversarial training which uses only adversarial examples with the maximum perturbation, and not all adversarial examples in the adversarial direction, which leads to decision boundary distortion and a highly curved loss surface. Based on this observation, we propose a simple method that not only prevents catastrophic overfitting, but also overrides the belief that it is difficult to prevent multi-step adversarial attacks with single-step adversarial training.

Related papers

Understanding and Combating Robust Overfitting via Input Loss Landscape Analysis and Regularization [5.1024659285813785]
Adrial training is prone to overfitting, and the cause is far from clear. We find that robust overfitting results from standard training, specifically the minimization of the clean loss. We propose a new regularizer to smooth the loss landscape by penalizing the weighted logits variation along the adversarial direction.
arXiv Detail & Related papers (2022-12-09T16:55:30Z)
Boundary Adversarial Examples Against Adversarial Overfitting [4.391102490444538]
adversarial training approaches suffer from robust overfitting where the robust accuracy decreases when models are adversarially trained for too long. Several mitigation approaches including early stopping, temporal ensembling and weight memorizations have been proposed to mitigate the effect of robust overfitting. In this paper, we investigate if these mitigation approaches are complimentary to each other in improving adversarial training performance.
arXiv Detail & Related papers (2022-11-25T13:16:53Z)
Fast Adversarial Training with Adaptive Step Size [62.37203478589929]
We study the phenomenon from the perspective of training instances. We propose a simple but effective method, Adversarial Training with Adaptive Step size (ATAS) ATAS learns an instancewise adaptive step size that is inversely proportional to its gradient norm.
arXiv Detail & Related papers (2022-06-06T08:20:07Z)
Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance. Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z)
Enhancing Adversarial Robustness for Deep Metric Learning [77.75152218980605]
adversarial robustness of deep metric learning models has to be improved. In order to avoid model collapse due to excessively hard examples, the existing defenses dismiss the min-max adversarial training. We propose Hardness Manipulation to efficiently perturb the training triplet till a specified level of hardness for adversarial training.
arXiv Detail & Related papers (2022-03-02T22:27:44Z)
Multi-stage Optimization based Adversarial Training [16.295921205749934]
We propose a Multi-stage Optimization based Adversarial Training (MOAT) method that periodically trains the model on mixed benign examples. Under similar amount of training overhead, the proposed MOAT exhibits better robustness than either single-step or multi-step adversarial training methods.
arXiv Detail & Related papers (2021-06-26T07:59:52Z)
Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise. Pre-processing methods may suffer from the robustness degradation effect. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z)
Robust Single-step Adversarial Training with Regularizer [11.35007968593652]
We propose a novel Fast Gradient Sign Method with PGD Regularization (FGSMPR) to boost the efficiency of adversarial training without catastrophic overfitting. Experiments demonstrate that our proposed method can train a robust deep network for L$_infty$-perturbations with FGSM adversarial training.
arXiv Detail & Related papers (2021-02-05T19:07:10Z)
Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness. We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
Towards Understanding Fast Adversarial Training [91.8060431517248]
We conduct experiments to understand the behavior of fast adversarial training. We show the key to its success is the ability to recover from overfitting to weak attacks.
arXiv Detail & Related papers (2020-06-04T18:19:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.