Training NTK to Generalize with KARE
- URL: http://arxiv.org/abs/2505.11347v2
- Date: Wed, 21 May 2025 09:58:43 GMT
- Title: Training NTK to Generalize with KARE
- Authors: Johannes Schwab, Bryan Kelly, Semyon Malamud, Teng Andrea Xu,
- Abstract summary: We show that explicitly trained kernels can outperform traditional end-to-end DNN optimization.<n>We argue that explicit training of NTK is a form of over-parametrized feature learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance of the data-dependent neural tangent kernel (NTK; Jacot et al. (2018)) associated with a trained deep neural network (DNN) often matches or exceeds that of the full network. This implies that DNN training via gradient descent implicitly performs kernel learning by optimizing the NTK. In this paper, we propose instead to optimize the NTK explicitly. Rather than minimizing empirical risk, we train the NTK to minimize its generalization error using the recently developed Kernel Alignment Risk Estimator (KARE; Jacot et al. (2020)). Our simulations and real data experiments show that NTKs trained with KARE consistently match or significantly outperform the original DNN and the DNN- induced NTK (the after-kernel). These results suggest that explicitly trained kernels can outperform traditional end-to-end DNN optimization in certain settings, challenging the conventional dominance of DNNs. We argue that explicit training of NTK is a form of over-parametrized feature learning.
Related papers
- Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - High-performance deep spiking neural networks with 0.3 spikes per neuron [9.01407445068455]
It is hard to train biologically-inspired spiking neural networks (SNNs) than artificial neural networks (ANNs)
We show that training deep SNN models achieves the exact same performance as that of ANNs.
Our SNN accomplishes high-performance classification with less than 0.3 spikes per neuron, lending itself for an energy-efficient implementation.
arXiv Detail & Related papers (2023-06-14T21:01:35Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Quantum-Inspired Tensor Neural Networks for Option Pricing [4.3942901219301564]
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions.
A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs.
This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to control for industrial applications.
arXiv Detail & Related papers (2022-12-28T19:39:55Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Weighted Neural Tangent Kernel: A Generalized and Improved
Network-Induced Kernel [20.84988773171639]
The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over- parameterized Neural Network (NN) trained by gradient descent.
We introduce the Weighted Neural Tangent Kernel (WNTK), a generalized and improved tool, which can capture an over- parameterized NN's training dynamics under different gradients.
With the proposed weight update algorithm, both empirical and analytical WNTKs outperform the corresponding NTKs in numerical experiments.
arXiv Detail & Related papers (2021-03-22T03:16:20Z) - Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel
Theory? [2.0711789781518752]
Neural Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent.
We study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs.
In particular, NTK theory does not explain the behavior of sufficiently deep networks so that their gradients explode as they propagate through the network's layers.
arXiv Detail & Related papers (2020-12-08T15:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.