Related papers: Scaling Neural Tangent Kernels via Sketching and Random Features

Scaling Neural Tangent Kernels via Sketching and Random Features

URL: http://arxiv.org/abs/2106.07880v1
Date: Tue, 15 Jun 2021 04:44:52 GMT
Title: Scaling Neural Tangent Kernels via Sketching and Random Features
Authors: Amir Zandieh, Insu Han, Haim Avron, Neta Shoham, Chaewon Kim, Jinwoo Shin
Abstract summary: Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels. We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
Score: 53.57615759435126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. However, the computational complexity of kernel methods has limited its use in large-scale learning tasks. To accelerate learning with NTK, we design a near input-sparsity time approximation algorithm for NTK, by sketching the polynomial expansions of arc-cosine kernels: our sketch for the convolutional counterpart of NTK (CNTK) can transform any image using a linear runtime in the number of pixels. Furthermore, we prove a spectral approximation guarantee for the NTK matrix, by combining random features (based on leverage score sampling) of the arc-cosine kernels with a sketching algorithm. We benchmark our methods on various large-scale regression and classification tasks and show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.

Related papers

Faithful and Efficient Explanations for Neural Networks via Neural Tangent Kernel Surrogate Models [7.608408123113268]
We analyze approximate empirical neural tangent kernels (eNTK) for data attribution. We introduce two new random projection variants of approximate eNTK which allow users to tune the time and memory complexity of their calculation. We conclude that kernel machines using approximate neural tangent kernel as the kernel function are effective surrogate models.
arXiv Detail & Related papers (2023-05-23T23:51:53Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials [52.11466135206223]
We show that a wide two-layer neural network can jointly use the Tangent Kernel (NTK) and the QuadNTK to fit target functions. This yields an end to end convergence and guarantee with provable sample improvement over both the NTK and QuadNTK on their own.
arXiv Detail & Related papers (2022-06-08T06:06:51Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Learning with Neural Tangent Kernels in Near Input Sparsity Time [5.472561051778218]
The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under at least squares loss by gradient descent. We propose a near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation. We show that in standard large-scale regression and classification tasks a linear regressor trained on our features outperforms trained NNs and Nystrom method with NTK kernels.
arXiv Detail & Related papers (2021-04-01T11:56:58Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
Neural Kernels Without Tangents [34.527798084824575]
We present an algebra for creating "compositional" kernels from bags of features. We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)"
arXiv Detail & Related papers (2020-03-04T18:25:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.