A Simple Algorithm For Scaling Up Kernel Methods
- URL: http://arxiv.org/abs/2301.11414v2
- Date: Mon, 30 Jan 2023 16:50:57 GMT
- Title: A Simple Algorithm For Scaling Up Kernel Methods
- Authors: Teng Andrea Xu, Bryan Kelly, Semyon Malamud
- Abstract summary: We introduce a novel random feature regression algorithm that allows us to scale to virtually infinite numbers of random features.
We illustrate the performance of our method on the CIFAR-10 dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent discovery of the equivalence between infinitely wide neural
networks (NNs) in the lazy training regime and Neural Tangent Kernels (NTKs)
(Jacot et al., 2018) has revived interest in kernel methods. However,
conventional wisdom suggests kernel methods are unsuitable for large samples
due to their computational complexity and memory requirements. We introduce a
novel random feature regression algorithm that allows us (when necessary) to
scale to virtually infinite numbers of random features. We illustrate the
performance of our method on the CIFAR-10 dataset.
Related papers
- Kernel Regression with Infinite-Width Neural Networks on Millions of
Examples [27.408712993696213]
We study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset.
We obtain a test accuracy of 91.2% (SotA for a pure kernel method)
arXiv Detail & Related papers (2023-03-09T17:11:31Z) - RFFNet: Large-Scale Interpretable Kernel Methods via Random Fourier Features [3.0079490585515347]
We introduce RFFNet, a scalable method that learns the kernel relevances' on the fly via first-order optimization.
We show that our approach has a small memory footprint and run-time, low prediction error, and effectively identifies relevant features.
We supply users with an efficient, PyTorch-based library, that adheres to the scikit-learn standard API and code for fully reproducing our results.
arXiv Detail & Related papers (2022-11-11T18:50:34Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior [26.52863547394537]
We present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior.
Our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.
arXiv Detail & Related papers (2020-12-21T13:41:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.