Related papers: VarGrad: A Low-Variance Gradient Estimator for Variational Inference

VarGrad: A Low-Variance Gradient Estimator for Variational Inference

URL: http://arxiv.org/abs/2010.10436v2
Date: Thu, 29 Oct 2020 10:27:27 GMT
Title: VarGrad: A Low-Variance Gradient Estimator for Variational Inference
Authors: Lorenz Richter, Ayman Boustati, Nikolas N\"usken, Francisco J. R. Ruiz, \"Omer Deniz Akyildiz
Abstract summary: We show that VarGrad offers a favourable variance versus trade-off compared to other state-of-the-art estimators on a discrete VAE.
Score: 9.108412698936105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative) ELBO. We show theoretically that this gradient estimator, which we call $\textit{VarGrad}$ due to its connection to the log-variance loss, exhibits lower variance than the score function method in certain settings, and that the leave-one-out control variate coefficients are close to the optimal ones. We empirically demonstrate that VarGrad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete VAE.

Related papers

Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation [59.86921150579892]
We deal with the problem of gradient estimation for differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions. We develop variance reduction strategies for differentiable sorting and ranking, differentiable shortest-paths on graphs, differentiable rendering for pose estimation, as well as differentiable cryo-ET simulations.
arXiv Detail & Related papers (2024-10-10T17:10:00Z)
Pathwise Gradient Variance Reduction with Control Variates in Variational Inference [2.1638817206926855]
Variational inference in Bayesian deep learning often involves computing the gradient of an expectation that lacks a closed-form solution. In these cases, pathwise and score-function gradient estimators are the most common approaches. Recent research suggests that even pathwise gradient estimators could benefit from variance reduction.
arXiv Detail & Related papers (2024-10-08T07:28:46Z)
Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models [28.011868604717726]
We present Adaptive IMLE, the first adaptive gradient estimator for complex discrete distributions. We show that our estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
arXiv Detail & Related papers (2022-09-11T13:32:39Z)
Gradient Estimation for Binary Latent Variables via Gradient Variance Clipping [6.234350105794441]
gradient estimation is often necessary for fitting generative models with discrete latent variables. DisARM and other estimators have potentially exploding variance near the boundary of the parameter space. We propose a new gradient estimator textitbitflip-1 that has lower variance at the boundaries of the parameter space.
arXiv Detail & Related papers (2022-08-12T05:37:52Z)
Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions. Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z)
Double Control Variates for Gradient Estimation in Discrete Latent Variable Models [32.33171301923846]
We introduce a variance reduction technique for score function estimators. We show that our estimator can have lower variance compared to other state-of-the-art estimators.
arXiv Detail & Related papers (2021-11-09T18:02:42Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization. This provably reduces the mean squared error. We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z)
A Study of Gradient Variance in Deep Learning [56.437755740715396]
We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling. We measure the gradient variance on common deep learning benchmarks and observe that, contrary to common assumptions, gradient variance increases during training.
arXiv Detail & Related papers (2020-07-09T03:23:10Z)
Estimating Gradients for Discrete Random Variables by Sampling without Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.