Properties of the After Kernel
- URL: http://arxiv.org/abs/2105.10585v1
- Date: Fri, 21 May 2021 21:50:18 GMT
- Title: Properties of the After Kernel
- Authors: Philip M. Long
- Abstract summary: The Neural Tangent Kernel (NTK) is the wide-network limit of a kernel defined using neural networks.
We study the "after kernel", which is defined using the same embedding, except after training, for neural networks with standard architectures.
- Score: 11.4219428942199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Neural Tangent Kernel (NTK) is the wide-network limit of a kernel defined
using neural networks at initialization, whose embedding is the gradient of the
output of the network with respect to its parameters. We study the "after
kernel", which is defined using the same embedding, except after training, for
neural networks with standard architectures, on binary classification problems
extracted from MNIST and CIFAR-10, trained using SGD in a standard way. Lyu and
Li described a sense in which neural networks, under certain conditions, are
equivalent to SVM with the after kernel. Our experiments are consistent with
this proposition under natural conditions. For networks with an architecure
similar to VGG, the after kernel is more "global", in the sense that it is less
invariant to transformations of input images that disrupt the global structure
of the image while leaving the local statistics largely intact. For fully
connected networks, the after kernel is less global in this sense. The after
kernel tends to be more invariant to small shifts, rotations and zooms; data
augmentation does not improve these invariances. The (finite approximation to
the) conjugate kernel, obtained using the last layer of hidden nodes,
sometimes, but not always, provides a good approximation to the NTK and the
after kernel.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - On the Eigenvalue Decay Rates of a Class of Neural-Network Related
Kernel Functions Defined on General Domains [10.360517127652185]
We provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain.
This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions.
arXiv Detail & Related papers (2023-05-04T08:54:40Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Incorporating Prior Knowledge into Neural Networks through an Implicit
Composite Kernel [1.6383321867266318]
Implicit Composite Kernel (ICK) is a kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties.
We demonstrate ICK's superior performance and flexibility on both synthetic and real-world data sets.
arXiv Detail & Related papers (2022-05-15T21:32:44Z) - Neural Networks as Kernel Learners: The Silent Alignment Effect [86.44610122423994]
Neural networks in the lazy training regime converge to kernel machines.
We show that this can indeed happen due to a phenomenon we term silent alignment.
We also demonstrate that non-whitened data can weaken the silent alignment effect.
arXiv Detail & Related papers (2021-10-29T18:22:46Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Neural Kernels Without Tangents [34.527798084824575]
We present an algebra for creating "compositional" kernels from bags of features.
We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)"
arXiv Detail & Related papers (2020-03-04T18:25:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.