Related papers: Catastrophic overfitting can be induced with discriminative non-robust features

Catastrophic overfitting can be induced with discriminative non-robust features

URL: http://arxiv.org/abs/2206.08242v2
Date: Tue, 15 Aug 2023 07:43:44 GMT
Title: Catastrophic overfitting can be induced with discriminative non-robust features
Authors: Guillermo Ortiz-Jim\'enez, Pau de Jorge, Amartya Sanyal, Adel Bibi, Puneet K. Dokania, Pascal Frossard, Gregory Rog\'ez, Philip H.S. Torr
Abstract summary: We study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images. We show that CO can be induced at much smaller $epsilon$ values than it was observed before just by injecting images with seemingly innocuous features.
Score: 95.07189577345059
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial training (AT) is the de facto method for building robust neural networks, but it can be computationally expensive. To mitigate this, fast single-step attacks can be used, but this may lead to catastrophic overfitting (CO). This phenomenon appears when networks gain non-trivial robustness during the first stages of AT, but then reach a breaking point where they become vulnerable in just a few iterations. The mechanisms that lead to this failure mode are still poorly understood. In this work, we study the onset of CO in single-step AT methods through controlled modifications of typical datasets of natural images. In particular, we show that CO can be induced at much smaller $\epsilon$ values than it was observed before just by injecting images with seemingly innocuous features. These features aid non-robust classification but are not enough to achieve robustness on their own. Through extensive experiments we analyze this novel phenomenon and discover that the presence of these easy features induces a learning shortcut that leads to CO. Our findings provide new insights into the mechanisms of CO and improve our understanding of the dynamics of AT. The code to reproduce our experiments can be found at https://github.com/gortizji/co_features.

Related papers

Since Faithfulness Fails: The Performance Limits of Neural Causal Discovery [1.8635507597668244]
We identify a fundamental limitation: neural networks cannot reliably distinguish between existing and non-existing causal relationships. Our experiments reveal that neural networks, as used in contemporary causal discovery approaches, lack the precision needed to recover ground-truth graphs.
arXiv Detail & Related papers (2025-02-22T03:20:20Z)
Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency [61.394997313144394]
Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT) We show that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity. Our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
arXiv Detail & Related papers (2024-05-25T14:56:30Z)
Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z)
Catastrophic Overfitting: A Potential Blessing in Disguise [51.996943482875366]
Fast Adversarial Training (FAT) has gained increasing attention within the research community owing to its efficacy in improving adversarial robustness. Although existing FAT approaches have made strides in mitigating CO, the ascent of adversarial robustness occurs with a non-negligible decline in classification accuracy on clean samples. We employ the feature activation differences between clean and adversarial examples to analyze the underlying causes of CO. We harness CO to achieve attack obfuscation', aiming to bolster model performance.
arXiv Detail & Related papers (2024-02-28T10:01:44Z)
Investigating Catastrophic Overfitting in Fast Adversarial Training: A Self-fitting Perspective [17.59014650714359]
We decouple single-step adversarial examples into data-information and self-information, which reveals an interesting phenomenon called "self-fitting" When self-fitting occurs, the network experiences an obvious "channel differentiation" phenomenon that some convolution channels accounting for recognizing self-information become dominant, while others for data-information are suppressed. Our findings reveal a self-learning mechanism in adversarial training and open up new perspectives for suppressing different kinds of information to mitigate CO.
arXiv Detail & Related papers (2023-02-23T12:23:35Z)
Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties. We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z)
Feature Purification: How Adversarial Training Performs Robust Deep Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.