A variational approximate posterior for the deep Wishart process
- URL: http://arxiv.org/abs/2107.10125v1
- Date: Wed, 21 Jul 2021 14:48:27 GMT
- Title: A variational approximate posterior for the deep Wishart process
- Authors: Sebastian W. Ober, Laurence Aitchison
- Abstract summary: Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs.
We give a novel approach to obtaining flexible distributions over positive semi-definite matrices.
We show that inference in the deep Wishart process gives improved performance over doing inference in a DGP with the equivalent prior.
- Score: 23.786649328915093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work introduced deep kernel processes as an entirely kernel-based
alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly
learn good top-layer representations by alternately sampling the kernel from a
distribution over positive semi-definite matrices and performing nonlinear
transformations. A particular deep kernel process, the deep Wishart process
(DWP), is of particular interest because its prior is equivalent to deep
Gaussian process (DGP) priors. However, inference in DWPs has not yet been
possible due to the lack of sufficiently flexible distributions over positive
semi-definite matrices. Here, we give a novel approach to obtaining flexible
distributions over positive semi-definite matrices by generalising the Bartlett
decomposition of the Wishart probability density. We use this new distribution
to develop an approximate posterior for the DWP that includes dependency across
layers. We develop a doubly-stochastic inducing-point inference scheme for the
DWP and show experimentally that inference in the DWP gives improved
performance over doing inference in a DGP with the equivalent prior.
Related papers
- RoPINN: Region Optimized Physics-Informed Neural Networks [66.38369833561039]
Physics-informed neural networks (PINNs) have been widely applied to solve partial differential equations (PDEs)
This paper proposes and theoretically studies a new training paradigm as region optimization.
A practical training algorithm, Region Optimized PINN (RoPINN), is seamlessly derived from this new paradigm.
arXiv Detail & Related papers (2024-05-23T09:45:57Z) - PDE+: Enhancing Generalization via PDE with Adaptive Distributional
Diffusion [66.95761172711073]
generalization of neural networks is a central challenge in machine learning.
We propose to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data.
We put this theoretical framework into practice as $textbfPDE+$ ($textbfPDE$ with $textbfA$daptive $textbfD$istributional $textbfD$iffusion)
arXiv Detail & Related papers (2023-05-25T08:23:26Z) - An Improved Variational Approximate Posterior for the Deep Wishart
Process [24.442174952832108]
Deep kernel processes are a recently introduced class of deep Bayesian models.
They operate by sampling a Gram matrix from a distribution over positive semi-definite matrices.
We show that further generalising their distribution to allow linear combinations of rows and columns results in better predictive performance.
arXiv Detail & Related papers (2023-05-23T18:26:29Z) - Scalable Optimal Margin Distribution Machine [50.281535710689795]
Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory.
This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method.
arXiv Detail & Related papers (2023-05-08T16:34:04Z) - Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them.
NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z) - Conditional Deep Gaussian Processes: empirical Bayes hyperdata learning [6.599344783327054]
We propose a conditional Deep Gaussian Process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata.
We show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space.
Preliminary extrapolation results demonstrate expressive power of the proposed model compared with GP kernel composition, DGP variational inference, and deep kernel learning.
arXiv Detail & Related papers (2021-10-01T17:50:48Z) - A theory of representation learning gives a deep generalisation of
kernel methods [22.260038428890383]
We develop a new infinite width limit, the Bayesian representation learning limit.
We show that it exhibits representation learning mirroring that in finite-width models.
Next, we introduce the possibility of using this limit and objective as a flexible, deep generalisation of kernel methods.
arXiv Detail & Related papers (2021-08-30T10:07:37Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Convolutional Normalizing Flows for Deep Gaussian Processes [40.10797051603641]
This paper introduces a new approach for specifying flexible, arbitrarily complex, and scalable approximate posterior distributions.
A novel convolutional normalizing flow (CNF) is developed to improve the time efficiency and capture dependency between layers.
Empirical evaluation demonstrates that CNF DGP outperforms the state-of-the-art approximation methods for DGPs.
arXiv Detail & Related papers (2021-04-17T07:25:25Z) - Deep kernel processes [34.99042782396683]
We find that deep Gaussian processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite BNNs with bottlenecks can all be written as deep kernel processes.
For DGPs the equivalence arises because the Gram matrix formed by the inner product of features is Wishart distributed.
We show that the deep inverse Wishart process gives superior performance to DGPs and infinite BNNs on standard fully-connected baselines.
arXiv Detail & Related papers (2020-10-04T14:31:18Z) - Kernel-Based Reinforcement Learning: A Finite-Time Analysis [53.47210316424326]
We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards.
We empirically validate our approach in continuous MDPs with sparse rewards.
arXiv Detail & Related papers (2020-04-12T12:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.