Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS
- URL: http://arxiv.org/abs/2009.10683v5
- Date: Thu, 18 Mar 2021 14:56:31 GMT
- Title: Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS
- Authors: Lin Chen, Sheng Xu
- Abstract summary: We prove that the exponential power kernel with a smaller power (making the kernel less smooth) leads to a larger RKHS.
We also prove that the reproducing kernel Hilbert spaces (RKHS) of a deep neural tangent kernel and the Laplace kernel include the same set of functions.
- Score: 10.578438886506076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove that the reproducing kernel Hilbert spaces (RKHS) of a deep neural
tangent kernel and the Laplace kernel include the same set of functions, when
both kernels are restricted to the sphere $\mathbb{S}^{d-1}$. Additionally, we
prove that the exponential power kernel with a smaller power (making the kernel
less smooth) leads to a larger RKHS, when it is restricted to the sphere
$\mathbb{S}^{d-1}$ and when it is defined on the entire $\mathbb{R}^d$.
Related papers
- Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines [12.11705128358537]
We propose a new class of positive definite kernels based on the spectral truncation.
We show that it is a governing factor leading to performance enhancement.
We also propose a deep learning perspective to increase the representation capacity of spectral truncation kernels.
arXiv Detail & Related papers (2024-05-28T04:47:12Z) - On the Eigenvalue Decay Rates of a Class of Neural-Network Related
Kernel Functions Defined on General Domains [10.360517127652185]
We provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain.
This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions.
arXiv Detail & Related papers (2023-05-04T08:54:40Z) - An Empirical Analysis of the Laplace and Neural Tangent Kernels [0.0]
The neural tangent kernel is a kernel function defined over the parameter distribution of an infinite width neural network.
We show that the Laplace kernel and neural tangent kernel share the same kernel Hilbert space in the space of $mathbbSd-1$.
arXiv Detail & Related papers (2022-08-07T16:18:02Z) - Spectral bounds of the $\varepsilon$-entropy of kernel classes [6.028247638616059]
We develop new bounds on the $varepsilon$-entropy of a unit ball in a reproducing kernel space induced by some Mercer kernel $K$.
In our approach we exploit an ellipsoidal structure of a unit ball in RKHS and a previous work on covering numbers of an ellipsoid in the euclidean space.
arXiv Detail & Related papers (2022-04-09T16:45:22Z) - Fast Sketching of Polynomial Kernels of Polynomial Degree [61.83993156683605]
kernel is especially important as other kernels can often be approximated by the kernel via a Taylor series expansion.
Recent techniques in sketching reduce the dependence in the running time on the degree oblivious $q$ of the kernel.
We give a new sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term.
arXiv Detail & Related papers (2021-08-21T02:14:55Z) - Taming Nonconvexity in Kernel Feature Selection---Favorable Properties
of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection.
The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z) - Faster Kernel Matrix Algebra via Density Estimation [46.253698241653254]
We study fast algorithms for computing fundamental properties of a positive semidefinite kernel matrix $K in mathbbRn times n$ corresponding to $n$ points.
arXiv Detail & Related papers (2021-02-16T18:25:47Z) - High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$.
We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z) - Isolation Distributional Kernel: A New Tool for Point & Group Anomaly
Detection [76.1522587605852]
Isolation Distributional Kernel (IDK) is a new way to measure the similarity between two distributions.
We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies.
arXiv Detail & Related papers (2020-09-24T12:25:43Z) - On the Similarity between the Laplace and Neural Tangent Kernels [26.371904197642145]
We show that NTK for fully connected networks is closely related to the standard Laplace kernel.
Our results suggest that much insight about neural networks can be obtained from analysis of the well-known Laplace kernel.
arXiv Detail & Related papers (2020-07-03T09:48:23Z) - Kernel-Based Reinforcement Learning: A Finite-Time Analysis [53.47210316424326]
We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards.
We empirically validate our approach in continuous MDPs with sparse rewards.
arXiv Detail & Related papers (2020-04-12T12:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.