Catastrophic Overfitting: A Potential Blessing in Disguise
- URL: http://arxiv.org/abs/2402.18211v1
- Date: Wed, 28 Feb 2024 10:01:44 GMT
- Title: Catastrophic Overfitting: A Potential Blessing in Disguise
- Authors: Mengnan Zhao, Lihe Zhang, Yuqiu Kong, Baocai Yin
- Abstract summary: Fast Adversarial Training (FAT) has gained increasing attention within the research community owing to its efficacy in improving adversarial robustness.
Although existing FAT approaches have made strides in mitigating CO, the ascent of adversarial robustness occurs with a non-negligible decline in classification accuracy on clean samples.
We employ the feature activation differences between clean and adversarial examples to analyze the underlying causes of CO.
We harness CO to achieve attack obfuscation', aiming to bolster model performance.
- Score: 51.996943482875366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fast Adversarial Training (FAT) has gained increasing attention within the
research community owing to its efficacy in improving adversarial robustness.
Particularly noteworthy is the challenge posed by catastrophic overfitting (CO)
in this field. Although existing FAT approaches have made strides in mitigating
CO, the ascent of adversarial robustness occurs with a non-negligible decline
in classification accuracy on clean samples. To tackle this issue, we initially
employ the feature activation differences between clean and adversarial
examples to analyze the underlying causes of CO. Intriguingly, our findings
reveal that CO can be attributed to the feature coverage induced by a few
specific pathways. By intentionally manipulating feature activation differences
in these pathways with well-designed regularization terms, we can effectively
mitigate and induce CO, providing further evidence for this observation.
Notably, models trained stably with these terms exhibit superior performance
compared to prior FAT work. On this basis, we harness CO to achieve `attack
obfuscation', aiming to bolster model performance. Consequently, the models
suffering from CO can attain optimal classification accuracy on both clean and
adversarial data when adding random noise to inputs during evaluation. We also
validate their robustness against transferred adversarial examples and the
necessity of inducing CO to improve robustness. Hence, CO may not be a problem
that has to be solved.
Related papers
- Adversarial Robustness Overestimation and Instability in TRADES [4.063518154926961]
TRADES sometimes yields disproportionately high PGD validation accuracy compared to the AutoAttack testing accuracy in the multiclass classification task.
This discrepancy highlights a significant overestimation of robustness for these instances, potentially linked to gradient masking.
arXiv Detail & Related papers (2024-10-10T07:32:40Z) - Improving Fast Adversarial Training Paradigm: An Example Taxonomy Perspective [61.38753850236804]
Fast adversarial training (FAT) is presented for efficient training and has become a hot research topic.
FAT suffers from catastrophic overfitting, which leads to a performance drop compared with multi-step adversarial training.
We present an example taxonomy in FAT, which identifies that catastrophic overfitting is caused by the imbalance between the inner and outer optimization in FAT.
arXiv Detail & Related papers (2024-07-22T03:56:27Z) - Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency [61.394997313144394]
Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT)
We show that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity.
Our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
arXiv Detail & Related papers (2024-05-25T14:56:30Z) - Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization [50.43319961935526]
Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness.
SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier.
In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour.
arXiv Detail & Related papers (2024-04-11T22:43:44Z) - Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - Catastrophic overfitting can be induced with discriminative non-robust
features [95.07189577345059]
We study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images.
We show that CO can be induced at much smaller $epsilon$ values than it was observed before just by injecting images with seemingly innocuous features.
arXiv Detail & Related papers (2022-06-16T15:22:39Z) - Exploiting the Relationship Between Kendall's Rank Correlation and
Cosine Similarity for Attribution Protection [21.341303776931532]
We first show that the expected Kendall's rank correlation is positively correlated to cosine similarity and then indicate that the direction of attribution is the key to attribution robustness.
Our analysis further exposes that IGR encourages neurons with the same activation states for natural samples and the corresponding perturbed samples, which is shown to induce robustness to gradient-based attribution methods.
arXiv Detail & Related papers (2022-05-15T13:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.