Random Features for the Neural Tangent Kernel
- URL: http://arxiv.org/abs/2104.01351v1
- Date: Sat, 3 Apr 2021 09:08:12 GMT
- Title: Random Features for the Neural Tangent Kernel
- Authors: Insu Han, Haim Avron, Neta Shoham, Chaewon Kim, Jinwoo Shin
- Abstract summary: We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
- Score: 57.132634274795066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Neural Tangent Kernel (NTK) has discovered connections between deep
neural networks and kernel methods with insights of optimization and
generalization. Motivated by this, recent works report that NTK can achieve
better performances compared to training neural networks on small-scale
datasets. However, results under large-scale settings are hardly studied due to
the computational limitation of kernel methods. In this work, we propose an
efficient feature map construction of the NTK of fully-connected ReLU network
which enables us to apply it to large-scale datasets. We combine random
features of the arc-cosine kernels with a sketching-based algorithm which can
run in linear with respect to both the number of data points and input
dimension. We show that dimension of the resulting features is much smaller
than other baseline feature map constructions to achieve comparable error
bounds both in theory and practice. We additionally utilize the leverage score
based sampling for improved bounds of arc-cosine random features and prove a
spectral approximation guarantee of the proposed feature map to the NTK matrix
of two-layer neural network. We benchmark a variety of machine learning tasks
to demonstrate the superiority of the proposed scheme. In particular, our
algorithm can run tens of magnitude faster than the exact kernel methods for
large-scale settings without performance loss.
Related papers
- Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Universality and Optimality of Structured Deep Kernel Networks [0.0]
Kernel based methods yield approximation models that are flexible, efficient and powerful.
Recent success of machine learning methods has been driven by deep neural networks (NNs)
In this paper, we show that the use of special types of kernels yield models reminiscent of neural networks.
arXiv Detail & Related papers (2021-05-15T14:10:35Z) - Learning with Neural Tangent Kernels in Near Input Sparsity Time [5.472561051778218]
The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under at least squares loss by gradient descent.
We propose a near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation.
We show that in standard large-scale regression and classification tasks a linear regressor trained on our features outperforms trained NNs and Nystrom method with NTK kernels.
arXiv Detail & Related papers (2021-04-01T11:56:58Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Neural Kernels Without Tangents [34.527798084824575]
We present an algebra for creating "compositional" kernels from bags of features.
We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)"
arXiv Detail & Related papers (2020-03-04T18:25:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.