Adversarial Robustness is at Odds with Lazy Training
- URL: http://arxiv.org/abs/2207.00411v1
- Date: Sat, 18 Jun 2022 00:51:30 GMT
- Title: Adversarial Robustness is at Odds with Lazy Training
- Authors: Yunjuan Wang, Enayat Ullah, Poorya Mianjy, Raman Arora
- Abstract summary: We show that a single gradient step can find adversarial examples for networks trained in the so-called lazy regime.
This is the first work to prove that such well-generalizable neural networks are still vulnerable to adversarial attacks.
- Score: 39.18321880557702
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works show that random neural networks are vulnerable against
adversarial attacks [Daniely and Schacham, 2020] and that such attacks can be
easily found using a single step of gradient descent [Bubeck et al., 2021]. In
this work, we take it one step further and show that a single gradient step can
find adversarial examples for networks trained in the so-called lazy regime.
This regime is interesting because even though the neural network weights
remain close to the initialization, there exist networks with small
generalization error, which can be found efficiently using first-order methods.
Our work challenges the model of the lazy regime, the dominant regime in which
neural networks are provably efficiently learnable. We show that the networks
trained in this regime, even though they enjoy good theoretical computational
guarantees, remain vulnerable to adversarial examples. To the best of our
knowledge, this is the first work to prove that such well-generalizable neural
networks are still vulnerable to adversarial attacks.
Related papers
- On Neural Network approximation of ideal adversarial attack and
convergence of adversarial training [3.553493344868414]
Adversarial attacks are usually expressed in terms of a gradient-based operation on the input data and model.
In this work, we solidify the idea of representing adversarial attacks as a trainable function, without further computation.
arXiv Detail & Related papers (2023-07-30T01:04:36Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - An anomaly detection approach for backdoored neural networks: face
recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection.
We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z) - Thundernna: a white box adversarial attack [0.0]
We develop a first-order method to attack the neural network.
Compared with other first-order attacks, our method has a much higher success rate.
arXiv Detail & Related papers (2021-11-24T07:06:21Z) - FooBaR: Fault Fooling Backdoor Attack on Neural Network Training [5.639451539396458]
We explore a novel attack paradigm by injecting faults during the training phase of a neural network in a way that the resulting network can be attacked during deployment without the necessity of further faulting.
We call such attacks fooling backdoors as the fault attacks at the training phase inject backdoors into the network that allow an attacker to produce fooling inputs.
arXiv Detail & Related papers (2021-09-23T09:43:19Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Truly Sparse Neural Networks at Scale [2.2860412844991655]
We train the largest neural network ever trained in terms of representational power -- reaching the bat brain size.
Our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.
arXiv Detail & Related papers (2021-02-02T20:06:47Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Towards Achieving Adversarial Robustness by Enforcing Feature
Consistency Across Bit Planes [51.31334977346847]
We train networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction.
We demonstrate that, by imposing consistency on the representations learned across differently quantized images, the adversarial robustness of networks improves significantly.
arXiv Detail & Related papers (2020-04-01T09:31:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.