Related papers: Learning with Neural Tangent Kernels in Near Input Sparsity Time

Learning with Neural Tangent Kernels in Near Input Sparsity Time

URL: http://arxiv.org/abs/2104.00415v1
Date: Thu, 1 Apr 2021 11:56:58 GMT
Title: Learning with Neural Tangent Kernels in Near Input Sparsity Time
Authors: Amir Zandieh
Abstract summary: The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under at least squares loss by gradient descent. We propose a near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation. We show that in standard large-scale regression and classification tasks a linear regressor trained on our features outperforms trained NNs and Nystrom method with NTK kernels.
Score: 5.472561051778218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under least squares loss by gradient descent (Jacot et al., 2018). However, despite its importance, the super-quadratic runtime of kernel methods limits the use of NTK in large-scale learning tasks. To accelerate kernel machines with NTK, we propose a near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation. Furthermore, we propose a feature map for approximating the convolutional counterpart of the NTK (Arora et al., 2019), which can transform any image using a runtime that is only linear in the number of pixels. We show that in standard large-scale regression and classification tasks a linear regressor trained on our features outperforms trained NNs and Nystrom method with NTK kernels.

Related papers

Making Sense of Touch: Unsupervised Shapelet Learning in Bag-of-words Sense [0.0]
This paper introduces NN-STNE, a neuralization network using t-distributed neighbor manipulation (t-SNE) data to reduce input dimensions by mapping as to shapelet data.
arXiv Detail & Related papers (2025-02-06T15:50:40Z)
A Unified Kernel for Neural Network Learning [4.0759204898334715]
We present the Unified Neural Kernel (UNK), which characterizes the learning dynamics of neural networks with gradient descents. UNK maintains the limiting properties of both NNGP and NTK, exhibiting behaviors akin to NTK with a finite learning step. We also theoretically characterize the uniform tightness and learning convergence of the UNK kernel.
arXiv Detail & Related papers (2024-03-26T07:55:45Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Fast Neural Kernel Embeddings for General Activations [44.665680851537005]
We propose a fast sketching method that approximates any multi-layered Neural Network Gaussian Process (NNGP) kernel and Neural Tangent Kernel (NTK) matrices for a wide range of activation functions. Our method achieves $106times$ speedup for approximate CNTK of a 5-layer Myrtle network on CIFAR-10 dataset.
arXiv Detail & Related papers (2022-09-09T05:10:39Z)
Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials [52.11466135206223]
We show that a wide two-layer neural network can jointly use the Tangent Kernel (NTK) and the QuadNTK to fit target functions. This yields an end to end convergence and guarantee with provable sample improvement over both the NTK and QuadNTK on their own.
arXiv Detail & Related papers (2022-06-08T06:06:51Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels. We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z)
Rapid Feature Evolution Accelerates Learning in Neural Networks [2.538209532048867]
We analyze the phenomenon of kernel alignment of the NTK with the target functions during gradient descent. We show that feature evolution is faster and more dramatic in deeper networks. We also found that networks with multiple output nodes develop separate, specialized kernels for each output channel.
arXiv Detail & Related papers (2021-05-29T13:50:03Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.