Related papers: Fast Neural Tangent Kernel Alignment, Norm and Effective Rank via Trace Estimation

Fast Neural Tangent Kernel Alignment, Norm and Effective Rank via Trace Estimation

URL: http://arxiv.org/abs/2511.10796v1
Date: Thu, 13 Nov 2025 20:51:07 GMT
Title: Fast Neural Tangent Kernel Alignment, Norm and Effective Rank via Trace Estimation
Authors: James Hazelden,
Abstract summary: We introduce a matrix-free perspective, using trace estimation to rapidly analyze the empirical, finite-width NTK.<n>This enables fast computation of the NTK's trace, Frobenius norm, effective rank, and alignment.<n>We show these so-called one-sided estimators can outperform Hutch++ in the low-sample regime.
Score: 0.006425846916579409
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The Neural Tangent Kernel (NTK) characterizes how a model's state evolves over Gradient Descent. Computing the full NTK matrix is often infeasible, especially for recurrent architectures. Here, we introduce a matrix-free perspective, using trace estimation to rapidly analyze the empirical, finite-width NTK. This enables fast computation of the NTK's trace, Frobenius norm, effective rank, and alignment. We provide numerical recipes based on the Hutch++ trace estimator with provably fast convergence guarantees. In addition, we show that, due to the structure of the NTK, one can compute the trace using only forward- or reverse-mode automatic differentiation, not requiring both modes. We show these so-called one-sided estimators can outperform Hutch++ in the low-sample regime, especially when the gap between the model state and parameter count is large. In total, our results demonstrate that matrix-free randomized approaches can yield speedups of many orders of magnitude, leading to faster analysis and applications of the NTK.

Related papers

State Estimation Using Sparse DEIM and Recurrent Neural Networks [0.0]
We introduce an equation-free S-DEIM framework that estimates the optimal kernel vector from sparse observational time series.<n>We show that the recurrent architecture is necessary since the kernel cannot be estimated from instantaneous observations.<n>In each case, the resulting S-DEIM estimates are satisfactory even when a relatively simple RNN architecture is used.
arXiv Detail & Related papers (2024-10-21T13:12:22Z)
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning [14.792099973449794]
We propose an algorithm to align the training dynamics of the sparse network with that of the dense one. We show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account. Path eXclusion (PX) is able to find lottery tickets even at high sparsity levels.
arXiv Detail & Related papers (2024-06-03T22:19:42Z)
Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network. We provide analytical expressions for these speed limits for linear and linearizable neural networks. Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials [52.11466135206223]
We show that a wide two-layer neural network can jointly use the Tangent Kernel (NTK) and the QuadNTK to fit target functions. This yields an end to end convergence and guarantee with provable sample improvement over both the NTK and QuadNTK on their own.
arXiv Detail & Related papers (2022-06-08T06:06:51Z)
Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels. We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Learning with Neural Tangent Kernels in Near Input Sparsity Time [5.472561051778218]
The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under at least squares loss by gradient descent. We propose a near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation. We show that in standard large-scale regression and classification tasks a linear regressor trained on our features outperforms trained NNs and Nystrom method with NTK kernels.
arXiv Detail & Related papers (2021-04-01T11:56:58Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.