Deep Latent-Variable Kernel Learning
- URL: http://arxiv.org/abs/2005.08467v2
- Date: Wed, 19 Aug 2020 04:46:29 GMT
- Title: Deep Latent-Variable Kernel Learning
- Authors: Haitao Liu, Yew-Soon Ong, Xiaomo Jiang, Xiaofang Wang
- Abstract summary: We present a complete deep latent-variable kernel learning (DLVKL) model wherein the latent variables perform encoding for regularized representation.
Experiments imply that the DLVKL-NSDE performs similarly to the well calibrated GP on small datasets, and outperforms existing deep GPs on large datasets.
- Score: 25.356503463916816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep kernel learning (DKL) leverages the connection between Gaussian process
(GP) and neural networks (NN) to build an end-to-end, hybrid model. It combines
the capability of NN to learn rich representations under massive data and the
non-parametric property of GP to achieve automatic regularization that
incorporates a trade-off between model fit and model complexity. However, the
deterministic encoder may weaken the model regularization of the following GP
part, especially on small datasets, due to the free latent representation. We
therefore present a complete deep latent-variable kernel learning (DLVKL) model
wherein the latent variables perform stochastic encoding for regularized
representation. We further enhance the DLVKL from two aspects: (i) the
expressive variational posterior through neural stochastic differential
equation (NSDE) to improve the approximation quality, and (ii) the hybrid prior
taking knowledge from both the SDE prior and the posterior to arrive at a
flexible trade-off. Intensive experiments imply that the DLVKL-NSDE performs
similarly to the well calibrated GP on small datasets, and outperforms existing
deep GPs on large datasets.
Related papers
- Latent Variable Double Gaussian Process Model for Decoding Complex Neural Data [0.0]
Non-parametric models, such as Gaussian Processes (GP), show promising results in the analysis of complex data.
We introduce a novel neural decoder model built upon GP models.
We demonstrate an application of this decoder model in a verbal memory experiment dataset.
arXiv Detail & Related papers (2024-05-08T20:49:34Z) - Gaussian Process Neural Additive Models [3.7969209746164325]
We propose a new subclass of Neural Additive Models (NAMs) that use a single-layer neural network construction of the Gaussian process via random Fourier features.
GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality.
We show that GP-NAM achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
arXiv Detail & Related papers (2024-02-19T20:29:34Z) - Label Deconvolution for Node Representation Learning on Large-scale
Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs.
Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - Linear Time GPs for Inferring Latent Trajectories from Neural Spike
Trains [7.936841911281107]
We propose cvHM, a general inference framework for latent GP models leveraging Hida-Mat'ern kernels and conjugate variational inference (CVI)
We are able to perform variational inference of latent neural trajectories with linear time complexity for arbitrary likelihoods.
arXiv Detail & Related papers (2023-06-01T16:31:36Z) - Vecchia Gaussian Process Ensembles on Internal Representations of Deep
Neural Networks [0.0]
For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification, while deep neural networks (DNNs) excel at representation learning.
We propose to combine these two approaches in a hybrid method consisting of an ensemble of GPs built on the output of hidden layers of a DNN.
arXiv Detail & Related papers (2023-05-26T16:19:26Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Exploring the Uncertainty Properties of Neural Networks' Implicit Priors
in the Infinite-Width Limit [47.324627920761685]
We use recent theoretical advances that characterize the function-space prior to an ensemble of infinitely-wide NNs as a Gaussian process.
This gives us a better understanding of the implicit prior NNs place on function space.
We also examine the calibration of previous approaches to classification with the NNGP.
arXiv Detail & Related papers (2020-10-14T18:41:54Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.