Evolution of Neural Tangent Kernels under Benign and Adversarial
Training
- URL: http://arxiv.org/abs/2210.12030v1
- Date: Fri, 21 Oct 2022 15:21:15 GMT
- Title: Evolution of Neural Tangent Kernels under Benign and Adversarial
Training
- Authors: Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus
- Abstract summary: We study the evolution of the empirical Neural Tangent Kernel (NTK) under standard and adversarial training.
We find under adversarial training, the empirical NTK rapidly converges to a different kernel (and feature map) than standard training.
This new kernel provides adversarial robustness, even when non-robust training is performed on top of it.
- Score: 109.07737733329019
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Two key challenges facing modern deep learning are mitigating deep networks'
vulnerability to adversarial attacks and understanding deep learning's
generalization capabilities. Towards the first issue, many defense strategies
have been developed, with the most common being Adversarial Training (AT).
Towards the second challenge, one of the dominant theories that has emerged is
the Neural Tangent Kernel (NTK) -- a characterization of neural network
behavior in the infinite-width limit. In this limit, the kernel is frozen, and
the underlying feature map is fixed. In finite widths, however, there is
evidence that feature learning happens at the earlier stages of the training
(kernel learning) before a second phase where the kernel remains fixed (lazy
training). While prior work has aimed at studying adversarial vulnerability
through the lens of the frozen infinite-width NTK, there is no work that
studies the adversarial robustness of the empirical/finite NTK during training.
In this work, we perform an empirical study of the evolution of the empirical
NTK under standard and adversarial training, aiming to disambiguate the effect
of adversarial training on kernel learning and lazy training. We find under
adversarial training, the empirical NTK rapidly converges to a different kernel
(and feature map) than standard training. This new kernel provides adversarial
robustness, even when non-robust training is performed on top of it.
Furthermore, we find that adversarial training on top of a fixed kernel can
yield a classifier with $76.1\%$ robust accuracy under PGD attacks with
$\varepsilon = 4/255$ on CIFAR-10.
Related papers
- Infinite Width Limits of Self Supervised Neural Networks [6.178817969919849]
We bridge the gap between the NTK and self-supervised learning, focusing on two-layer neural networks trained under the Barlow Twins loss.
We prove that the NTK of Barlow Twins indeed becomes constant as the width of the network approaches infinity.
arXiv Detail & Related papers (2024-11-17T21:13:57Z) - Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? [0.0]
We study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods.
We show how NTKs allow to generate adversarial examples in a training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the lazy'' regime.
arXiv Detail & Related papers (2022-10-11T16:11:48Z) - Limitations of the NTK for Understanding Generalization in Deep Learning [13.44676002603497]
We study NTKs through the lens of scaling laws, and demonstrate that they fall short of explaining important aspects of neural network generalization.
We show that even if the empirical NTK is allowed to be pre-trained on a constant number of samples, the kernel scaling does not catch up to the neural network scaling.
arXiv Detail & Related papers (2022-06-20T21:23:28Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.