Catastrophic overfitting can be induced with discriminative non-robust
features
- URL: http://arxiv.org/abs/2206.08242v2
- Date: Tue, 15 Aug 2023 07:43:44 GMT
- Title: Catastrophic overfitting can be induced with discriminative non-robust
features
- Authors: Guillermo Ortiz-Jim\'enez, Pau de Jorge, Amartya Sanyal, Adel Bibi,
Puneet K. Dokania, Pascal Frossard, Gregory Rog\'ez, Philip H.S. Torr
- Abstract summary: We study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images.
We show that CO can be induced at much smaller $epsilon$ values than it was observed before just by injecting images with seemingly innocuous features.
- Score: 95.07189577345059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training (AT) is the de facto method for building robust neural
networks, but it can be computationally expensive. To mitigate this, fast
single-step attacks can be used, but this may lead to catastrophic overfitting
(CO). This phenomenon appears when networks gain non-trivial robustness during
the first stages of AT, but then reach a breaking point where they become
vulnerable in just a few iterations. The mechanisms that lead to this failure
mode are still poorly understood. In this work, we study the onset of CO in
single-step AT methods through controlled modifications of typical datasets of
natural images. In particular, we show that CO can be induced at much smaller
$\epsilon$ values than it was observed before just by injecting images with
seemingly innocuous features. These features aid non-robust classification but
are not enough to achieve robustness on their own. Through extensive
experiments we analyze this novel phenomenon and discover that the presence of
these easy features induces a learning shortcut that leads to CO. Our findings
provide new insights into the mechanisms of CO and improve our understanding of
the dynamics of AT. The code to reproduce our experiments can be found at
https://github.com/gortizji/co_features.
Related papers
- Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency [61.394997313144394]
Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT)
We show that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity.
Our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
arXiv Detail & Related papers (2024-05-25T14:56:30Z) - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Catastrophic Overfitting: A Potential Blessing in Disguise [51.996943482875366]
Fast Adversarial Training (FAT) has gained increasing attention within the research community owing to its efficacy in improving adversarial robustness.
Although existing FAT approaches have made strides in mitigating CO, the ascent of adversarial robustness occurs with a non-negligible decline in classification accuracy on clean samples.
We employ the feature activation differences between clean and adversarial examples to analyze the underlying causes of CO.
We harness CO to achieve attack obfuscation', aiming to bolster model performance.
arXiv Detail & Related papers (2024-02-28T10:01:44Z) - Investigating Catastrophic Overfitting in Fast Adversarial Training: A
Self-fitting Perspective [17.59014650714359]
We decouple single-step adversarial examples into data-information and self-information, which reveals an interesting phenomenon called "self-fitting"
When self-fitting occurs, the network experiences an obvious "channel differentiation" phenomenon that some convolution channels accounting for recognizing self-information become dominant, while others for data-information are suppressed.
Our findings reveal a self-learning mechanism in adversarial training and open up new perspectives for suppressing different kinds of information to mitigate CO.
arXiv Detail & Related papers (2023-02-23T12:23:35Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.