Towards Improved Learning in Gaussian Processes: The Best of Two Worlds
- URL: http://arxiv.org/abs/2211.06260v1
- Date: Fri, 11 Nov 2022 15:04:10 GMT
- Title: Towards Improved Learning in Gaussian Processes: The Best of Two Worlds
- Authors: Rui Li, ST John, Arno Solin
- Abstract summary: We design a hybrid training procedure where the inference leverages conjugate-computation VI and the learning uses an EP-like marginal likelihood approximation.
We empirically demonstrate on binary classification that this provides a good learning objective and generalizes better.
- Score: 18.134776677795077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian process training decomposes into inference of the (approximate)
posterior and learning of the hyperparameters. For non-Gaussian (non-conjugate)
likelihoods, two common choices for approximate inference are Expectation
Propagation (EP) and Variational Inference (VI), which have complementary
strengths and weaknesses. While VI's lower bound to the marginal likelihood is
a suitable objective for inferring the approximate posterior, it does not
automatically imply it is a good learning objective for hyperparameter
optimization. We design a hybrid training procedure where the inference
leverages conjugate-computation VI and the learning uses an EP-like marginal
likelihood approximation. We empirically demonstrate on binary classification
that this provides a good learning objective and generalizes better.
Related papers
- Likelihood approximations via Gaussian approximate inference [3.4991031406102238]
We propose efficient schemes to approximate the effects of non-Gaussian likelihoods by Gaussian densities.
Our results attain good approximation quality for binary and multiclass classification in large-scale point-estimate and distributional inferential settings.
As a by-product, we show that the proposed approximate log-likelihoods are a superior alternative to least-squares on raw labels for neural network classification.
arXiv Detail & Related papers (2024-10-28T05:39:26Z) - Towards Improved Variational Inference for Deep Bayesian Models [7.841254447222393]
In this thesis, we explore the use of variational inference (VI) as an approximation.
VI is unique in simultaneously approximating the posterior and providing a lower bound to the marginal likelihood.
We propose a variational posterior that provides a unified view of inference in Bayesian neural networks and deep Gaussian processes.
arXiv Detail & Related papers (2024-01-23T00:40:20Z) - Improving Hyperparameter Learning under Approximate Inference in
Gaussian Process Models [18.134776677795077]
We focus on the interplay between variational inference (VI) and the learning target.
We design a hybrid training procedure to bring the best of both worlds: it leverages conjugate-computation VI for inference.
We empirically demonstrate the effectiveness of our proposal across a wide range of data sets.
arXiv Detail & Related papers (2023-06-07T07:15:08Z) - Federated Learning as Variational Inference: A Scalable Expectation
Propagation Approach [66.9033666087719]
This paper extends the inference view and describes a variational inference formulation of federated learning.
We apply FedEP on standard federated learning benchmarks and find that it outperforms strong baselines in terms of both convergence speed and accuracy.
arXiv Detail & Related papers (2023-02-08T17:58:11Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Laplace Matching for fast Approximate Inference in Generalized Linear
Models [27.70274403550477]
We propose an approximate inference framework primarily designed to be computationally cheap while still achieving high approximation quality.
The concept, which we call emphLaplace Matching, involves closed-form, approximate, bi-directional transformations between the parameter spaces of exponential families.
This effectively turns inference in GLMs into conjugate inference (with small approximation errors)
arXiv Detail & Related papers (2021-05-07T08:25:17Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Approximate Inference for Fully Bayesian Gaussian Process Regression [11.47317712333228]
Learning in Gaussian Process models occurs through the adaptation of hyper parameters of the mean and the covariance function.
An alternative learning procedure is to infer the posterior over hyper parameters in a hierarchical specification of GPs we call textitFully Bayesian Gaussian Process Regression (GPR)
We analyze the predictive performance for fully Bayesian GPR on a range of benchmark data sets.
arXiv Detail & Related papers (2019-12-31T17:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.