Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines
- URL: http://arxiv.org/abs/2410.06171v1
- Date: Tue, 8 Oct 2024 16:15:53 GMT
- Title: Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines
- Authors: Edward Milsom, Ben Anson, Laurence Aitchison,
- Abstract summary: Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10.
We introduce several modifications to improve the convolutional deep kernel machine's generalisation.
The resulting model achieves 94.5% test accuracy on CIFAR-10.
- Score: 23.09717258810923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10 using a ResNet-inspired architecture, which is SOTA for kernel methods. However, this still lags behind neural networks, which easily achieve over 94% test accuracy with similar architectures. In this work we introduce several modifications to improve the convolutional deep kernel machine's generalisation, including stochastic kernel regularisation, which adds noise to the learned Gram matrices during training. The resulting model achieves 94.5% test accuracy on CIFAR-10. This finding has important theoretical and practical implications, as it demonstrates that the ability to perform well on complex tasks like image classification is not unique to neural networks. Instead, other approaches including deep kernel methods can achieve excellent performance on such tasks, as long as they have the capacity to learn representations from data.
Related papers
- Convolutional Deep Kernel Machines [25.958907308877148]
Recent work modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained.
Applying this modified limit to a deep Gaussian process gives a practical learning algorithm which they dubbed the deep kernel machine (DKM)
arXiv Detail & Related papers (2023-09-18T14:36:17Z) - Kernel Regression with Infinite-Width Neural Networks on Millions of
Examples [27.408712993696213]
We study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset.
We obtain a test accuracy of 91.2% (SotA for a pure kernel method)
arXiv Detail & Related papers (2023-03-09T17:11:31Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Can we achieve robustness from data alone? [0.7366405857677227]
Adversarial training and its variants have come to be the prevailing methods to achieve adversarially robust classification using neural networks.
We devise a meta-learning method for robust classification, that optimize the dataset prior to its deployment in a principled way.
Experiments on MNIST and CIFAR-10 demonstrate that the datasets we produce enjoy very high robustness against PGD attacks.
arXiv Detail & Related papers (2022-07-24T12:14:48Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Generative Kernel Continual learning [117.79080100313722]
We introduce generative kernel continual learning, which exploits the synergies between generative models and kernels for continual learning.
The generative model is able to produce representative samples for kernel learning, which removes the dependence on memory in kernel continual learning.
We conduct extensive experiments on three widely-used continual learning benchmarks that demonstrate the abilities and benefits of our contributions.
arXiv Detail & Related papers (2021-12-26T16:02:10Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels
Methods [0.0]
We show the importance of a data-dependent feature extraction step that is key to the obtain good performance in convolutional kernel methods.
We scale this method to the challenging ImageNet dataset, showing such a simple approach can exceed all existing non-learned representation methods.
arXiv Detail & Related papers (2021-01-19T09:30:58Z) - Every Model Learned by Gradient Descent Is Approximately a Kernel
Machine [0.0]
Deep learning's successes are often attributed to its ability to automatically discover new representations of the data.
We show, however, that deep networks learned by the standard gradient descent algorithm are mathematically approximately equivalent to kernel machines.
arXiv Detail & Related papers (2020-11-30T23:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.