Approximate Inference for Spectral Mixture Kernel
- URL: http://arxiv.org/abs/2006.07036v1
- Date: Fri, 12 Jun 2020 09:39:29 GMT
- Title: Approximate Inference for Spectral Mixture Kernel
- Authors: Yohan Jung, Kyungwoo Song, Jinkyoo Park
- Abstract summary: We propose an approximate Bayesian inference for the spectral mixture kernel.
We optimize the variational parameters by applying a sampling-based variational inference to the derived evidence lower bound (ELBO) estimator.
The proposed inference combined with two strategies accelerates the convergence of the parameters and leads to better optimal parameters.
- Score: 25.087829816206813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A spectral mixture (SM) kernel is a flexible kernel used to model any
stationary covariance function. Although it is useful in modeling data, the
learning of the SM kernel is generally difficult because optimizing a large
number of parameters for the SM kernel typically induces an over-fitting,
particularly when a gradient-based optimization is used. Also, a longer
training time is required. To improve the training, we propose an approximate
Bayesian inference for the SM kernel. Specifically, we employ the variational
distribution of the spectral points to approximate SM kernel with a random
Fourier feature. We optimize the variational parameters by applying a
sampling-based variational inference to the derived evidence lower bound (ELBO)
estimator constructed from the approximate kernel. To improve the inference, we
further propose two additional strategies: (1) a sampling strategy of spectral
points to estimate the ELBO estimator reliably and thus its associated
gradient, and (2) an approximate natural gradient to accelerate the convergence
of the parameters. The proposed inference combined with two strategies
accelerates the convergence of the parameters and leads to better optimal
parameters.
Related papers
- Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation [5.298814565953444]
Relative position encoding methods address the length extrapolation challenge exclusively through the implementation of a single kernel function.
This study proposes a novel relative positional encoding method, called MEP, which employs a weighted average to combine distinct kernel functions.
We present two distinct versions of our method: a parameter-free variant that requires no new learnable parameters, and a parameterized variant capable of integrating state-of-the-art techniques.
arXiv Detail & Related papers (2024-03-26T13:38:06Z) - A Unified Gaussian Process for Branching and Nested Hyperparameter
Optimization [19.351804144005744]
In deep learning, tuning parameters with conditional dependence are common in practice.
New GP model accounts for the dependent structure among input variables through a new kernel function.
High prediction accuracy and better optimization efficiency are observed in a series of synthetic simulations and real data applications of neural networks.
arXiv Detail & Related papers (2024-01-19T21:11:32Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning.
Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning.
For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Marginalised Gaussian Processes with Nested Sampling [10.495114898741203]
Gaussian Process (GPs) models are a rich distribution over functions with inductive biases controlled by a kernel function.
This work presents an alternative learning procedure where the hyperparameters of the kernel function are marginalised using Nested Sampling (NS)
arXiv Detail & Related papers (2020-10-30T16:04:35Z) - Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning.
gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.