Markovian Score Climbing: Variational Inference with KL(p||q)
- URL: http://arxiv.org/abs/2003.10374v2
- Date: Mon, 22 Feb 2021 19:46:38 GMT
- Title: Markovian Score Climbing: Variational Inference with KL(p||q)
- Authors: Christian A. Naesseth and Fredrik Lindsten and David Blei
- Abstract summary: We develop a simple algorithm for reliably minimizing the "exclusive Kullback-Leibler (KL)" KL(p q)
This method converges to a local optimum of the inclusive KL.
It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Monte Carlo.
- Score: 16.661889249333676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern variational inference (VI) uses stochastic gradients to avoid
intractable expectations, enabling large-scale probabilistic inference in
complex models. VI posits a family of approximating distributions q and then
finds the member of that family that is closest to the exact posterior p.
Traditionally, VI algorithms minimize the "exclusive Kullback-Leibler (KL)"
KL(q || p), often for computational convenience. Recent research, however, has
also focused on the "inclusive KL" KL(p || q), which has good statistical
properties that makes it more appropriate for certain inference problems. This
paper develops a simple algorithm for reliably minimizing the inclusive KL
using stochastic gradients with vanishing bias. This method, which we call
Markovian score climbing (MSC), converges to a local optimum of the inclusive
KL. It does not suffer from the systematic errors inherent in existing methods,
such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which
lead to bias in their final estimates. We illustrate convergence on a toy model
and demonstrate the utility of MSC on Bayesian probit regression for
classification as well as a stochastic volatility model for financial data.
Related papers
- Sequential Monte Carlo for Inclusive KL Minimization in Amortized Variational Inference [3.126959812401426]
We propose SMC-Wake, a procedure for fitting an amortized variational approximation that uses sequential Monte Carlo samplers to estimate the gradient of the inclusive KL divergence.
In experiments with both simulated and real datasets, SMC-Wake fits variational distributions that approximate the posterior more accurately than existing methods.
arXiv Detail & Related papers (2024-03-15T18:13:48Z) - SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning [49.94607673097326]
We propose a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data.
Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization algorithm.
Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios.
arXiv Detail & Related papers (2024-02-21T03:39:04Z) - Curvature-Sensitive Predictive Coding with Approximate Laplace Monte
Carlo [1.1470070927586016]
Predictive coding (PC) accounts of perception now form one of the dominant computational theories of the brain.
Despite this, they have enjoyed little export to the broader field of machine learning.
This has been due to the poor performance of models trained with PC when evaluated by both sample quality and marginal likelihood.
arXiv Detail & Related papers (2023-03-09T01:29:58Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic
Divergence [17.665255113864795]
We present a novel divergence-like metric which corresponds to the upper bound of the Kullback-Leibler divergence (KLD) of a relaxed categorical distribution.
We also propose a relaxed categorical analytic bound variational autoencoder (ReCAB-VAE) that successfully models both continuous and relaxed latent representations.
arXiv Detail & Related papers (2022-05-09T08:11:46Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z) - A Batch Normalized Inference Network Keeps the KL Vanishing Away [35.40781000297285]
Variational Autoencoder (VAE) is widely used to approximate a model's posterior on latent variables.
VAE often converges to a degenerated local optimum known as "posterior collapse"
arXiv Detail & Related papers (2020-04-27T05:20:01Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.