Convolutional Deep Kernel Machines
- URL: http://arxiv.org/abs/2309.09814v3
- Date: Mon, 26 Feb 2024 16:11:13 GMT
- Title: Convolutional Deep Kernel Machines
- Authors: Edward Milsom, Ben Anson, Laurence Aitchison
- Abstract summary: Recent work modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained.
Applying this modified limit to a deep Gaussian process gives a practical learning algorithm which they dubbed the deep kernel machine (DKM)
- Score: 25.958907308877148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard infinite-width limits of neural networks sacrifice the ability for
intermediate layers to learn representations from data. Recent work (A theory
of representation learning gives a deep generalisation of kernel methods, Yang
et al. 2023) modified the Neural Network Gaussian Process (NNGP) limit of
Bayesian neural networks so that representation learning is retained.
Furthermore, they found that applying this modified limit to a deep Gaussian
process gives a practical learning algorithm which they dubbed the deep kernel
machine (DKM). However, they only considered the simplest possible setting:
regression in small, fully connected networks with e.g. 10 input features.
Here, we introduce convolutional deep kernel machines. This required us to
develop a novel inter-domain inducing point approximation, as well as
introducing and experimentally assessing a number of techniques not previously
seen in DKMs, including analogues to batch normalisation, different
likelihoods, and different types of top-layer. The resulting model trains in
roughly 77 GPU hours, achieving around 99% test accuracy on MNIST, 72% on
CIFAR-100, and 92.7% on CIFAR-10, which is SOTA for kernel methods.
Related papers
- Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines [23.09717258810923]
Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10.
We introduce several modifications to improve the convolutional deep kernel machine's generalisation.
The resulting model achieves 94.5% test accuracy on CIFAR-10.
arXiv Detail & Related papers (2024-10-08T16:15:53Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Kernel Regression with Infinite-Width Neural Networks on Millions of
Examples [27.408712993696213]
We study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset.
We obtain a test accuracy of 91.2% (SotA for a pure kernel method)
arXiv Detail & Related papers (2023-03-09T17:11:31Z) - A Simple Algorithm For Scaling Up Kernel Methods [0.0]
We introduce a novel random feature regression algorithm that allows us to scale to virtually infinite numbers of random features.
We illustrate the performance of our method on the CIFAR-10 dataset.
arXiv Detail & Related papers (2023-01-26T20:59:28Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Random Convolution Kernels with Multi-Scale Decomposition for Preterm
EEG Inter-burst Detection [0.0]
Linear classifiers with random convolution kernels are computationally efficient methods that need no design or domain knowledge.
A recently proposed method, RandOm Convolutional KErnel Transforms, has shown high accuracy across a range of time-series data sets.
We propose a multi-scale version of this method, using both high- and low-frequency components.
arXiv Detail & Related papers (2021-08-04T13:07:41Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Kernel Based Progressive Distillation for Adder Neural Networks [71.731127378807]
Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption.
There is an accuracy drop when replacing all convolution filters by adder filters.
We present a novel method for further improving the performance of ANNs without increasing the trainable parameters.
arXiv Detail & Related papers (2020-09-28T03:29:19Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Accurate Tumor Tissue Region Detection with Accelerated Deep
Convolutional Neural Networks [12.7414209590152]
Manual annotation of pathology slides for cancer diagnosis is laborious and repetitive.
Our approach, (FLASH) is based on a Deep Convolutional Neural Network (DCNN) architecture.
It reduces computational costs and is faster than typical deep learning approaches by two orders of magnitude.
arXiv Detail & Related papers (2020-04-18T08:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.