A Revision of Neural Tangent Kernel-based Approaches for Neural Networks
- URL: http://arxiv.org/abs/2007.00884v2
- Date: Thu, 6 Aug 2020 23:30:27 GMT
- Title: A Revision of Neural Tangent Kernel-based Approaches for Neural Networks
- Authors: Kyung-Su Kim, Aur\'elie C. Lozano, Eunho Yang
- Abstract summary: We use the neural tangent kernel to show that networks can fit any finite training sample perfectly.
A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network.
Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
- Score: 34.75076385561115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent theoretical works based on the neural tangent kernel (NTK) have shed
light on the optimization and generalization of over-parameterized networks,
and partially bridge the gap between their practical success and classical
learning theory. Especially, using the NTK-based approach, the following three
representative results were obtained: (1) A training error bound was derived to
show that networks can fit any finite training sample perfectly by reflecting a
tighter characterization of training speed depending on the data complexity.
(2) A generalization error bound invariant of network size was derived by using
a data-dependent complexity measure (CMD). It follows from this CMD bound that
networks can generalize arbitrary smooth functions. (3) A simple and analytic
kernel function was derived as indeed equivalent to a fully-trained network.
This kernel outperforms its corresponding network and the existing gold
standard, Random Forests, in few shot learning. For all of these results to
hold, the network scaling factor $\kappa$ should decrease w.r.t. sample size n.
In this case of decreasing $\kappa$, however, we prove that the aforementioned
results are surprisingly erroneous. It is because the output value of trained
network decreases to zero when $\kappa$ decreases w.r.t. n. To solve this
problem, we tighten key bounds by essentially removing $\kappa$-affected
values. Our tighter analysis resolves the scaling problem and enables the
validation of the original NTK-based results.
Related papers
- Preconditioned Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression [8.130817534654089]
We consider nonparametric regression by a two-layer neural network trained by gradient descent (GD) or its variant in this paper.
We show that, if the neural network is trained with a novel Preconditioned Gradient Descent (PGD) with early stopping and the target function has spectral bias widely studied in the deep learning literature, the trained network renders a particularly sharp generalization bound with a minimax optimal rate of $cO(1/n4alpha/(4alpha+1)$.
arXiv Detail & Related papers (2024-07-16T03:38:34Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - The edge of chaos: quantum field theory and deep neural networks [0.0]
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks.
We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$.
Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
arXiv Detail & Related papers (2021-09-27T18:00:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.