Related papers: Wide Two-Layer Networks can Learn from Adversarial Perturbations

Wide Two-Layer Networks can Learn from Adversarial Perturbations

URL: http://arxiv.org/abs/2410.23677v1
Date: Thu, 31 Oct 2024 06:55:57 GMT
Title: Wide Two-Layer Networks can Learn from Adversarial Perturbations
Authors: Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki,
Abstract summary: We theoretically explain the counterintuitive success of perturbation learning. We prove that adversarial perturbations contain sufficient class-specific features for networks to generalize from them.
Score: 27.368408524000778
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Adversarial examples have raised several open questions, such as why they can deceive classifiers and transfer between different models. A prevailing hypothesis to explain these phenomena suggests that adversarial perturbations appear as random noise but contain class-specific features. This hypothesis is supported by the success of perturbation learning, where classifiers trained solely on adversarial examples and the corresponding incorrect labels generalize well to correctly labeled test data. Although this hypothesis and perturbation learning are effective in explaining intriguing properties of adversarial examples, their solid theoretical foundation is limited. In this study, we theoretically explain the counterintuitive success of perturbation learning. We assume wide two-layer networks and the results hold for any data distribution. We prove that adversarial perturbations contain sufficient class-specific features for networks to generalize from them. Moreover, the predictions of classifiers trained on mislabeled adversarial examples coincide with those of classifiers trained on correctly labeled clean samples. The code is available at https://github.com/s-kumano/perturbation-learning.

Related papers

Theoretical Understanding of Learning from Adversarial Perturbations [30.759348459463467]
It is not fully understood why adversarial examples can deceive neural networks and transfer between different networks. We provide a theoretical framework for understanding learning from perturbations using a one-hidden-layer network. Our results highlight that various adversarial perturbations, even perturbations of a few pixels, contain sufficient class features for generalization.
arXiv Detail & Related papers (2024-02-16T06:22:44Z)
Smoothed Embeddings for Certified Few-Shot Learning [63.68667303948808]
We extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings. Our results are confirmed by experiments on different datasets.
arXiv Detail & Related papers (2022-02-02T18:19:04Z)
Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community. We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z)
Classification and Adversarial examples in an Overparameterized Linear Model: A Signal Processing Perspective [10.515544361834241]
State-of-the-art deep learning classifiers are highly susceptible to infinitesmal adversarial perturbations. We find that the learned model is susceptible to adversaries in an intermediate regime where classification generalizes but regression does not. Despite the adversarial susceptibility, we find that classification with these features can be easier than the more commonly studied "independent feature" models.
arXiv Detail & Related papers (2021-09-27T17:35:42Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
How benign is benign overfitting? [96.07549886487526]
We investigate two causes for adversarial vulnerability in deep neural networks: bad data and (poorly) trained models. Deep neural networks essentially achieve zero training error, even in the presence of label noise. We identify label noise as one of the causes for adversarial vulnerability.
arXiv Detail & Related papers (2020-07-08T11:07:10Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
A Bayes-Optimal View on Adversarial Examples [9.51828574518325]
We argue for examining adversarial examples from the perspective of Bayes-optimal classification. Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier.
arXiv Detail & Related papers (2020-02-20T16:43:47Z)
Generating Natural Adversarial Hyperspectral examples with a modified Wasserstein GAN [0.0]
We present a new method which is able to generate natural adversarial examples from the true data following the second paradigm. We provide a proof of concept of our method by generating adversarial hyperspectral signatures on a remote sensing dataset.
arXiv Detail & Related papers (2020-01-27T07:32:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.