Feature Purification: How Adversarial Training Performs Robust Deep
Learning
- URL: http://arxiv.org/abs/2005.10190v4
- Date: Mon, 13 Jun 2022 04:45:25 GMT
- Title: Feature Purification: How Adversarial Training Performs Robust Deep
Learning
- Authors: Zeyuan Allen-Zhu and Yuanzhi Li
- Abstract summary: We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
- Score: 66.05472746340142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the empirical success of using Adversarial Training to defend deep
learning models against adversarial perturbations, so far, it still remains
rather unclear what the principles are behind the existence of adversarial
perturbations, and what adversarial training does to the neural network to
remove them.
In this paper, we present a principle that we call Feature Purification,
where we show one of the causes of the existence of adversarial examples is the
accumulation of certain small dense mixtures in the hidden weights during the
training process of a neural network; and more importantly, one of the goals of
adversarial training is to remove such mixtures to purify hidden weights. We
present both experiments on the CIFAR-10 dataset to illustrate this principle,
and a theoretical result proving that for certain natural classification tasks,
training a two-layer neural network with ReLU activation using randomly
initialized gradient descent indeed satisfies this principle.
Technically, we give, to the best of our knowledge, the first result proving
that the following two can hold simultaneously for training a neural network
with ReLU activation. (1) Training over the original data is indeed non-robust
to small adversarial perturbations of some radius. (2) Adversarial training,
even with an empirical perturbation algorithm such as FGM, can in fact be
provably robust against ANY perturbations of the same radius. Finally, we also
prove a complexity lower bound, showing that low complexity models such as
linear classifiers, low-degree polynomials, or even the neural tangent kernel
for this network, CANNOT defend against perturbations of this same radius, no
matter what algorithms are used to train them.
Related papers
- Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data [38.44734564565478]
We provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory.
We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning.
arXiv Detail & Related papers (2024-10-11T03:59:49Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Evolution of Neural Tangent Kernels under Benign and Adversarial
Training [109.07737733329019]
We study the evolution of the empirical Neural Tangent Kernel (NTK) under standard and adversarial training.
We find under adversarial training, the empirical NTK rapidly converges to a different kernel (and feature map) than standard training.
This new kernel provides adversarial robustness, even when non-robust training is performed on top of it.
arXiv Detail & Related papers (2022-10-21T15:21:15Z) - Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting [19.08269066145619]
Some interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance.
We argue that real interpolating methods like neural networks do not fit benignly.
arXiv Detail & Related papers (2022-07-14T00:23:01Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Benign Overfitting in Two-layer Convolutional Neural Networks [90.75603889605043]
We study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN)
We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss.
On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss.
arXiv Detail & Related papers (2022-02-14T07:45:51Z) - Efficient and Robust Classification for Sparse Attacks [34.48667992227529]
We consider perturbations bounded by the $ell$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection.
We propose a novel defense method that consists of "truncation" and "adrial training"
Motivated by the insights we obtain, we extend these components to neural network classifiers.
arXiv Detail & Related papers (2022-01-23T21:18:17Z) - Over-parametrized neural networks as under-determined linear systems [31.69089186688224]
We show that it is unsurprising simple neural networks can achieve zero training loss.
We show that kernels typically associated with the ReLU activation function have fundamental flaws.
We propose new activation functions that avoid the pitfalls of ReLU in that they admit zero training loss solutions for any set of distinct data points.
arXiv Detail & Related papers (2020-10-29T21:43:00Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.