Deep Neural Networks as Point Estimates for Deep Gaussian Processes
- URL: http://arxiv.org/abs/2105.04504v1
- Date: Mon, 10 May 2021 16:55:17 GMT
- Title: Deep Neural Networks as Point Estimates for Deep Gaussian Processes
- Authors: Vincent Dutordoir, James Hensman, Mark van der Wilk, Carl Henrik Ek,
Zoubin Ghahramani, Nicolas Durrande
- Abstract summary: We propose a sparse variational approximation for DGPs for which the approximate posterior mean has the same mathematical structure as a Deep Neural Network (DNN)
We make the forward pass through a DGP equivalent to a ReLU DNN by finding an interdomain transformation that represents the GP posterior mean as a sum of ReLU basis functions.
Experiments demonstrate improved accuracy and faster training compared to current DGP methods, while retaining favourable predictive uncertainties.
- Score: 44.585609003513625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Gaussian processes (DGPs) have struggled for relevance in applications
due to the challenges and cost associated with Bayesian inference. In this
paper we propose a sparse variational approximation for DGPs for which the
approximate posterior mean has the same mathematical structure as a Deep Neural
Network (DNN). We make the forward pass through a DGP equivalent to a ReLU DNN
by finding an interdomain transformation that represents the GP posterior mean
as a sum of ReLU basis functions. This unification enables the initialisation
and training of the DGP as a neural network, leveraging the well established
practice in the deep learning community, and so greatly aiding the inference
task. The experiments demonstrate improved accuracy and faster training
compared to current DGP methods, while retaining favourable predictive
uncertainties.
Related papers
- Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Linear Time GPs for Inferring Latent Trajectories from Neural Spike
Trains [7.936841911281107]
We propose cvHM, a general inference framework for latent GP models leveraging Hida-Mat'ern kernels and conjugate variational inference (CVI)
We are able to perform variational inference of latent neural trajectories with linear time complexity for arbitrary likelihoods.
arXiv Detail & Related papers (2023-06-01T16:31:36Z) - Vecchia Gaussian Process Ensembles on Internal Representations of Deep
Neural Networks [0.0]
For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification, while deep neural networks (DNNs) excel at representation learning.
We propose to combine these two approaches in a hybrid method consisting of an ensemble of GPs built on the output of hidden layers of a DNN.
arXiv Detail & Related papers (2023-05-26T16:19:26Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - Guided Deep Kernel Learning [42.53025115287688]
We present a novel approach for learning deep kernels by utilizing infinite-width neural networks.
Our approach harnesses the reliable uncertainty estimation of the NNGPs to adapt the DKL target confidence when it encounters novel data points.
arXiv Detail & Related papers (2023-02-19T13:37:34Z) - Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them.
NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z) - Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary.
With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions.
The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z) - Differentially private training of neural networks with Langevin
dynamics forcalibrated predictive uncertainty [58.730520380312676]
We show that differentially private gradient descent (DP-SGD) can yield poorly calibrated, overconfident deep learning models.
This represents a serious issue for safety-critical applications, e.g. in medical diagnosis.
arXiv Detail & Related papers (2021-07-09T08:14:45Z) - Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process.
This gives us a better understanding of the implicit prior NNs place on function space.
We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z) - Predicting the outputs of finite deep neural networks trained with noisy
gradients [1.1470070927586014]
A recent line of works studied wide deep neural networks (DNNs) by approximating them as Gaussian Processes (GPs)
Here we consider a DNN training protocol involving noise, weight decay and finite width, whose outcome corresponds to a certain non-Gaussian process.
An analytical framework is then introduced to analyze this non-Gaussian process, whose deviation from a GP is controlled by the finite width.
arXiv Detail & Related papers (2020-04-02T18:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.