Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization
- URL: http://arxiv.org/abs/2404.08154v2
- Date: Sat, 14 Sep 2024 00:26:37 GMT
- Title: Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization
- Authors: Runqi Lin, Chaojian Yu, Tongliang Liu,
- Abstract summary: Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness.
SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier.
In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour.
- Score: 50.43319961935526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. However, SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier, making it vulnerable to multi-step adversarial attacks. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour, that is, although these training samples are generated by the inner maximization process, their associated loss decreases instead, which we named abnormal adversarial examples (AAEs). Upon further analysis, we discover a close relationship between AAEs and classifier distortion, as both the number and outputs of AAEs undergo a significant variation with the onset of CO. Given this observation, we re-examine the SSAT process and uncover that before the occurrence of CO, the classifier already displayed a slight distortion, indicated by the presence of few AAEs. Furthermore, the classifier directly optimizing these AAEs will accelerate its distortion, and correspondingly, the variation of AAEs will sharply increase as a result. In such a vicious circle, the classifier rapidly becomes highly distorted and manifests as CO within a few iterations. These observations motivate us to eliminate CO by hindering the generation of AAEs. Specifically, we design a novel method, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted. Extensive experiments demonstrate that our method can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.
Related papers
- Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency [61.394997313144394]
Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT)
We show that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity.
Our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
arXiv Detail & Related papers (2024-05-25T14:56:30Z) - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Catastrophic Overfitting: A Potential Blessing in Disguise [51.996943482875366]
Fast Adversarial Training (FAT) has gained increasing attention within the research community owing to its efficacy in improving adversarial robustness.
Although existing FAT approaches have made strides in mitigating CO, the ascent of adversarial robustness occurs with a non-negligible decline in classification accuracy on clean samples.
We employ the feature activation differences between clean and adversarial examples to analyze the underlying causes of CO.
We harness CO to achieve attack obfuscation', aiming to bolster model performance.
arXiv Detail & Related papers (2024-02-28T10:01:44Z) - Efficient local linearity regularization to overcome catastrophic
overfitting [59.463867084204566]
Catastrophic overfitting (CO) in single-step adversarial training results in abrupt drops in the adversarial test accuracy (even down to 0%)
We introduce a regularization term, called ELLE, to mitigate CO effectively and efficiently in classical AT evaluations.
arXiv Detail & Related papers (2024-01-21T22:55:26Z) - SSTA: Salient Spatially Transformed Attack [18.998300969035885]
Deep neural networks (DNNs) are vulnerable to adversarial attacks.
In this paper, we propose the Salient Spatially Transformed Attack (SSTA) to craft imperceptible adversarial example (AE)
Compared to state-of-the-art baselines, experiments indicated that SSTA could effectively improve the imperceptibility of the AEs while maintaining a 100% attack success rate.
arXiv Detail & Related papers (2023-12-12T13:38:00Z) - Hard Adversarial Example Mining for Improving Robust Fairness [18.02943802341582]
Adversarial training (AT) is widely considered the state-of-the-art technique for improving the robustness of deep neural networks (DNNs) against adversarial examples (AE)
Recent studies have revealed that adversarially trained models are prone to unfairness problems, restricting their applicability.
To alleviate this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM.
arXiv Detail & Related papers (2023-08-03T15:33:24Z) - Provable Unrestricted Adversarial Training without Compromise with Generalizability [44.02361569894942]
Adversarial training (AT) is widely considered as the most promising strategy to defend against adversarial attacks.
The existing AT methods often achieve adversarial robustness at the expense of standard generalizability.
We propose a novel AT approach called Provable Unrestricted Adversarial Training (PUAT)
arXiv Detail & Related papers (2023-01-22T07:45:51Z) - Catastrophic overfitting can be induced with discriminative non-robust
features [95.07189577345059]
We study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images.
We show that CO can be induced at much smaller $epsilon$ values than it was observed before just by injecting images with seemingly innocuous features.
arXiv Detail & Related papers (2022-06-16T15:22:39Z) - Detecting Adversarial Examples from Sensitivity Inconsistency of
Spatial-Transform Domain [17.191679125809035]
adversarial examples (AEs) are maliciously designed to cause dramatic model output errors.
In this work, we reveal that normal examples (NEs) are insensitive to the fluctuations occurring at the highly-curved region of the decision boundary.
AEs typically designed over one single domain (mostly spatial domain) exhibit exorbitant sensitivity on such fluctuations.
arXiv Detail & Related papers (2021-03-07T08:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.