Deep kernel processes
- URL: http://arxiv.org/abs/2010.01590v2
- Date: Sun, 30 May 2021 12:23:26 GMT
- Title: Deep kernel processes
- Authors: Laurence Aitchison, Adam X. Yang, Sebastian W. Ober
- Abstract summary: We find that deep Gaussian processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite BNNs with bottlenecks can all be written as deep kernel processes.
For DGPs the equivalence arises because the Gram matrix formed by the inner product of features is Wishart distributed.
We show that the deep inverse Wishart process gives superior performance to DGPs and infinite BNNs on standard fully-connected baselines.
- Score: 34.99042782396683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We define deep kernel processes in which positive definite Gram matrices are
progressively transformed by nonlinear kernel functions and by sampling from
(inverse) Wishart distributions. Remarkably, we find that deep Gaussian
processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite
BNNs with bottlenecks can all be written as deep kernel processes. For DGPs the
equivalence arises because the Gram matrix formed by the inner product of
features is Wishart distributed, and as we show, standard isotropic kernels can
be written entirely in terms of this Gram matrix -- we do not need knowledge of
the underlying features. We define a tractable deep kernel process, the deep
inverse Wishart process, and give a doubly-stochastic inducing-point
variational inference scheme that operates on the Gram matrices, not on the
features, as in DGPs. We show that the deep inverse Wishart process gives
superior performance to DGPs and infinite BNNs on standard fully-connected
baselines.
Related papers
- Thin and Deep Gaussian Processes [43.22976185646409]
This work proposes a novel synthesis of both previous approaches: Thin and Deep GP (TDGP)
We show with theoretical and experimental results that i) TDGP is tailored to specifically discover lower-dimensional manifold in the input data, ii) TDGP behaves well when increasing the number of layers, and iv) TDGP performs well in standard benchmark datasets.
arXiv Detail & Related papers (2023-10-17T18:50:24Z) - Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates.
Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z) - An Improved Variational Approximate Posterior for the Deep Wishart
Process [24.442174952832108]
Deep kernel processes are a recently introduced class of deep Bayesian models.
They operate by sampling a Gram matrix from a distribution over positive semi-definite matrices.
We show that further generalising their distribution to allow linear combinations of rows and columns results in better predictive performance.
arXiv Detail & Related papers (2023-05-23T18:26:29Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - A theory of representation learning gives a deep generalisation of
kernel methods [22.260038428890383]
We develop a new infinite width limit, the Bayesian representation learning limit.
We show that it exhibits representation learning mirroring that in finite-width models.
Next, we introduce the possibility of using this limit and objective as a flexible, deep generalisation of kernel methods.
arXiv Detail & Related papers (2021-08-30T10:07:37Z) - A variational approximate posterior for the deep Wishart process [23.786649328915093]
Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs.
We give a novel approach to obtaining flexible distributions over positive semi-definite matrices.
We show that inference in the deep Wishart process gives improved performance over doing inference in a DGP with the equivalent prior.
arXiv Detail & Related papers (2021-07-21T14:48:27Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Neural Splines: Fitting 3D Surfaces with Infinitely-Wide Neural Networks [61.07202852469595]
We present Neural Splines, a technique for 3D surface reconstruction that is based on random feature kernels arising from infinitely-wide shallow ReLU networks.
Our method achieves state-of-the-art results, outperforming recent neural network-based techniques and widely used Poisson Surface Reconstruction.
arXiv Detail & Related papers (2020-06-24T14:54:59Z) - Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning
via Gaussian Processes [144.6048446370369]
Graph convolutional neural networks(GCNs) have recently demonstrated promising results on graph-based semi-supervised classification.
We propose a GP regression model via GCNs(GPGC) for graph-based semi-supervised learning.
We conduct extensive experiments to evaluate GPGC and demonstrate that it outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-02-26T10:02:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.