Kernel Regression with Infinite-Width Neural Networks on Millions of
Examples
- URL: http://arxiv.org/abs/2303.05420v1
- Date: Thu, 9 Mar 2023 17:11:31 GMT
- Title: Kernel Regression with Infinite-Width Neural Networks on Millions of
Examples
- Authors: Ben Adlam, Jaehoon Lee, Shreyas Padhy, Zachary Nado, and Jasper Snoek
- Abstract summary: We study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset.
We obtain a test accuracy of 91.2% (SotA for a pure kernel method)
- Score: 27.408712993696213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural kernels have drastically increased performance on diverse and
nonstandard data modalities but require significantly more compute, which
previously limited their application to smaller datasets. In this work, we
address this by massively parallelizing their computation across many GPUs. We
combine this with a distributed, preconditioned conjugate gradients algorithm
to enable kernel regression at a large scale (i.e. up to five million
examples). Using this approach, we study scaling laws of several neural kernels
across many orders of magnitude for the CIFAR-5m dataset. Using data
augmentation to expand the original CIFAR-10 training dataset by a factor of
20, we obtain a test accuracy of 91.2\% (SotA for a pure kernel method).
Moreover, we explore neural kernels on other data modalities, obtaining results
on protein and small molecule prediction tasks that are competitive with SotA
methods.
Related papers
- Convolutional Deep Kernel Machines [25.958907308877148]
Recent work modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained.
Applying this modified limit to a deep Gaussian process gives a practical learning algorithm which they dubbed the deep kernel machine (DKM)
arXiv Detail & Related papers (2023-09-18T14:36:17Z) - A Simple Algorithm For Scaling Up Kernel Methods [0.0]
We introduce a novel random feature regression algorithm that allows us to scale to virtually infinite numbers of random features.
We illustrate the performance of our method on the CIFAR-10 dataset.
arXiv Detail & Related papers (2023-01-26T20:59:28Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Local Random Feature Approximations of the Gaussian Kernel [14.230653042112834]
We focus on the popular Gaussian kernel and on techniques to linearize kernel-based models by means of random feature approximations.
We show that such approaches yield poor results when modelling high-frequency data, and we propose a novel localization scheme that improves kernel approximations and downstream performance significantly.
arXiv Detail & Related papers (2022-04-12T09:52:36Z) - How Do Graph Networks Generalize to Large and Diverse Molecular Systems? [10.690849483282564]
We identify four aspects of complexity in which many datasets are lacking.
We propose the GemNet-OC model, which outperforms the previous state-of-the-art on OC20 by 16%.
Our findings challenge the common belief that graph neural networks work equally well independent of dataset size and diversity.
arXiv Detail & Related papers (2022-04-06T12:52:34Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.