Weighted Neural Tangent Kernel: A Generalized and Improved
Network-Induced Kernel
- URL: http://arxiv.org/abs/2103.11558v1
- Date: Mon, 22 Mar 2021 03:16:20 GMT
- Title: Weighted Neural Tangent Kernel: A Generalized and Improved
Network-Induced Kernel
- Authors: Lei Tan, Shutong Wu, Xiaolin Huang
- Abstract summary: The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over- parameterized Neural Network (NN) trained by gradient descent.
We introduce the Weighted Neural Tangent Kernel (WNTK), a generalized and improved tool, which can capture an over- parameterized NN's training dynamics under different gradients.
With the proposed weight update algorithm, both empirical and analytical WNTKs outperform the corresponding NTKs in numerical experiments.
- Score: 20.84988773171639
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Neural Tangent Kernel (NTK) has recently attracted intense study, as it
describes the evolution of an over-parameterized Neural Network (NN) trained by
gradient descent. However, it is now well-known that gradient descent is not
always a good optimizer for NNs, which can partially explain the unsatisfactory
practical performance of the NTK regression estimator. In this paper, we
introduce the Weighted Neural Tangent Kernel (WNTK), a generalized and improved
tool, which can capture an over-parameterized NN's training dynamics under
different optimizers. Theoretically, in the infinite-width limit, we prove: i)
the stability of the WNTK at initialization and during training, and ii) the
equivalence between the WNTK regression estimator and the corresponding NN
estimator with different learning rates on different parameters. With the
proposed weight update algorithm, both empirical and analytical WNTKs
outperform the corresponding NTKs in numerical experiments.
Related papers
- On the Convergence Analysis of Over-Parameterized Variational Autoencoders: A Neural Tangent Kernel Perspective [7.580900499231056]
Variational Auto-Encoders (VAEs) have emerged as powerful probabilistic models for generative tasks.
This paper provides a mathematical proof of VAE under mild assumptions.
We also establish a novel connection between the optimization problem faced by over-Eized SNNs and the Kernel Ridge (KRR) problem.
arXiv Detail & Related papers (2024-09-09T06:10:31Z) - How many Neurons do we need? A refined Analysis for Shallow Networks
trained with Gradient Descent [0.0]
We analyze the generalization properties of two-layer neural networks in the neural tangent kernel regime.
We derive fast rates of convergence that are known to be minimax optimal in the framework of non-parametric regression.
arXiv Detail & Related papers (2023-09-14T22:10:28Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Analyzing Convergence in Quantum Neural Networks: Deviations from Neural
Tangent Kernels [20.53302002578558]
A quantum neural network (QNN) is a parameterized mapping efficiently implementable on near-term Noisy Intermediate-Scale Quantum (NISQ) computers.
Despite the existing empirical and theoretical investigations, the convergence of QNN training is not fully understood.
arXiv Detail & Related papers (2023-03-26T22:58:06Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - The Recurrent Neural Tangent Kernel [11.591070761599328]
We introduce and study the Recurrent Neural Tangent Kernel (RNTK), which provides new insights into the behavior of overparametrized RNNs.
A synthetic and 56 real-world data experiments demonstrate that the RNTK offers significant performance gains over other kernels.
arXiv Detail & Related papers (2020-06-18T02:59:21Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.