Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency
- URL: http://arxiv.org/abs/2405.16262v2
- Date: Sat, 14 Sep 2024 00:25:07 GMT
- Title: Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency
- Authors: Runqi Lin, Chaojian Yu, Bo Han, Hang Su, Tongliang Liu,
- Abstract summary: Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT)
We show that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity.
Our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
- Score: 61.394997313144394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT), manifesting as highly distorted deep neural networks (DNNs) that are vulnerable to multi-step adversarial attacks. However, the underlying factors that lead to the distortion of decision boundaries remain unclear. In this work, we delve into the specific changes within different DNN layers and discover that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity. Our analysis further reveals that this increased sensitivity in former layers stems from the formation of pseudo-robust shortcuts, which alone can impeccably defend against single-step adversarial attacks but bypass genuine-robust learning, resulting in distorted decision boundaries. Eliminating these shortcuts can partially restore robustness in DNNs from the CO state, thereby verifying that dependence on them triggers the occurrence of CO. This understanding motivates us to implement adaptive weight perturbations across different layers to hinder the generation of pseudo-robust shortcuts, consequently mitigating CO. Extensive experiments demonstrate that our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
Related papers
- Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization [50.43319961935526]
Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness.
SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier.
In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour.
arXiv Detail & Related papers (2024-04-11T22:43:44Z) - Catastrophic Overfitting: A Potential Blessing in Disguise [51.996943482875366]
Fast Adversarial Training (FAT) has gained increasing attention within the research community owing to its efficacy in improving adversarial robustness.
Although existing FAT approaches have made strides in mitigating CO, the ascent of adversarial robustness occurs with a non-negligible decline in classification accuracy on clean samples.
We employ the feature activation differences between clean and adversarial examples to analyze the underlying causes of CO.
We harness CO to achieve attack obfuscation', aiming to bolster model performance.
arXiv Detail & Related papers (2024-02-28T10:01:44Z) - Fixed Inter-Neuron Covariability Induces Adversarial Robustness [26.878913741674058]
The vulnerability to adversarial perturbations is a major flaw of Deep Neural Networks (DNNs)
We have developed the Self-Consistent Activation layer, which comprises of neurons whose activations are consistent with each other, as they conform to a fixed, but learned, covariability pattern.
The models with a SCA layer achieved high accuracy, and exhibited significantly greater robustness than multi-layer perceptron models to state-of-the-art Auto-PGD adversarial attacks textitwithout being trained on adversarially perturbed data.
arXiv Detail & Related papers (2023-08-07T23:46:14Z) - Catastrophic overfitting can be induced with discriminative non-robust
features [95.07189577345059]
We study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images.
We show that CO can be induced at much smaller $epsilon$ values than it was observed before just by injecting images with seemingly innocuous features.
arXiv Detail & Related papers (2022-06-16T15:22:39Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Towards Understanding the Dynamics of the First-Order Adversaries [40.54670072901657]
An acknowledged weakness of neural networks is their vulnerability to adversarial perturbations to the inputs.
One of the most popular defense mechanisms is to maximize the loss over the constrained perturbations on the inputs using projected ascent and minimize over weights.
We investigate the non-concave landscape of the adversaries for a two-layer neural network with a quadratic loss.
arXiv Detail & Related papers (2020-10-20T22:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.