Scaling Neural Tangent Kernels via Sketching and Random Features
- URL: http://arxiv.org/abs/2106.07880v1
- Date: Tue, 15 Jun 2021 04:44:52 GMT
- Title: Scaling Neural Tangent Kernels via Sketching and Random Features
- Authors: Amir Zandieh, Insu Han, Haim Avron, Neta Shoham, Chaewon Kim, Jinwoo
Shin
- Abstract summary: Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
- Score: 53.57615759435126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide
neural networks trained under least squares loss by gradient descent. Recent
works also report that NTK regression can outperform finitely-wide neural
networks trained on small-scale datasets. However, the computational complexity
of kernel methods has limited its use in large-scale learning tasks. To
accelerate learning with NTK, we design a near input-sparsity time
approximation algorithm for NTK, by sketching the polynomial expansions of
arc-cosine kernels: our sketch for the convolutional counterpart of NTK (CNTK)
can transform any image using a linear runtime in the number of pixels.
Furthermore, we prove a spectral approximation guarantee for the NTK matrix, by
combining random features (based on leverage score sampling) of the arc-cosine
kernels with a sketching algorithm. We benchmark our methods on various
large-scale regression and classification tasks and show that a linear
regressor trained on our CNTK features matches the accuracy of exact CNTK on
CIFAR-10 dataset while achieving 150x speedup.
Related papers
- Faithful and Efficient Explanations for Neural Networks via Neural
Tangent Kernel Surrogate Models [7.608408123113268]
We analyze approximate empirical neural tangent kernels (eNTK) for data attribution.
We introduce two new random projection variants of approximate eNTK which allow users to tune the time and memory complexity of their calculation.
We conclude that kernel machines using approximate neural tangent kernel as the kernel function are effective surrogate models.
arXiv Detail & Related papers (2023-05-23T23:51:53Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Identifying good directions to escape the NTK regime and efficiently
learn low-degree plus sparse polynomials [52.11466135206223]
We show that a wide two-layer neural network can jointly use the Tangent Kernel (NTK) and the QuadNTK to fit target functions.
This yields an end to end convergence and guarantee with provable sample improvement over both the NTK and QuadNTK on their own.
arXiv Detail & Related papers (2022-06-08T06:06:51Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Learning with Neural Tangent Kernels in Near Input Sparsity Time [5.472561051778218]
The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under at least squares loss by gradient descent.
We propose a near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation.
We show that in standard large-scale regression and classification tasks a linear regressor trained on our features outperforms trained NNs and Nystrom method with NTK kernels.
arXiv Detail & Related papers (2021-04-01T11:56:58Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Neural Kernels Without Tangents [34.527798084824575]
We present an algebra for creating "compositional" kernels from bags of features.
We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)"
arXiv Detail & Related papers (2020-03-04T18:25:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.