Pathwise Gradient Variance Reduction with Control Variates in   Variational Inference
        - URL: http://arxiv.org/abs/2410.05753v1
- Date: Tue, 8 Oct 2024 07:28:46 GMT
- Title: Pathwise Gradient Variance Reduction with Control Variates in   Variational Inference
- Authors: Kenyon Ng, Susan Wei, 
- Abstract summary: Variational inference in Bayesian deep learning often involves computing the gradient of an expectation that lacks a closed-form solution.
In these cases, pathwise and score-function gradient estimators are the most common approaches.
Recent research suggests that even pathwise gradient estimators could benefit from variance reduction.
- Score: 2.1638817206926855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Variational inference in Bayesian deep learning often involves computing the gradient of an expectation that lacks a closed-form solution. In these cases, pathwise and score-function gradient estimators are the most common approaches. The pathwise estimator is often favoured for its substantially lower variance compared to the score-function estimator, which typically requires variance reduction techniques. However, recent research suggests that even pathwise gradient estimators could benefit from variance reduction. In this work, we review existing control-variates-based variance reduction methods for pathwise gradient estimators to assess their effectiveness. Notably, these methods often rely on integrand approximations and are applicable only to simple variational families. To address this limitation, we propose applying zero-variance control variates to pathwise gradient estimators. This approach offers the advantage of requiring minimal assumptions about the variational distribution, other than being able to sample from it. 
 
      
        Related papers
        - Practical Improvements of A/B Testing with Off-Policy Estimation [51.25970890274447]
 We introduce a family of unbiased off-policy estimators that achieves lower variance than the standard approach.<n>Our theoretical analysis and experimental results validate the effectiveness and practicality of the proposed method.
 arXiv  Detail & Related papers  (2025-06-12T13:11:01Z)
- Gradients should stay on Path: Better Estimators of the Reverse- and
  Forward KL Divergence for Normalizing Flows [4.830811539001643]
 We propose an algorithm to estimate the path-gradient of both the reverse and forward Kullback-Leibler divergence for an arbitrary manifestly invertible normalizing flow.
The resulting path-gradient estimators are straightforward to implement, have lower variance, and lead not only to faster convergence of training but also to better overall approximation results.
 arXiv  Detail & Related papers  (2022-07-17T16:27:41Z)
- Path-Gradient Estimators for Continuous Normalizing Flows [4.830811539001643]
 Recent work has established a path-gradient estimator for simple variational Gaussian distributions.
We propose a path-gradient estimator for the considerably more expressive variational family of continuous normalizing flows.
 arXiv  Detail & Related papers  (2022-06-17T21:25:06Z)
- Gradient Estimation with Discrete Stein Operators [44.64146470394269]
 We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
 arXiv  Detail & Related papers  (2022-02-19T02:22:23Z)
- Differentiable Annealed Importance Sampling and the Perils of Gradient
  Noise [68.44523807580438]
 Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
 arXiv  Detail & Related papers  (2021-07-21T17:10:14Z)
- Coordinate-wise Control Variates for Deep Policy Gradients [23.24910014825916]
 The effect of vector-valued baselines for neural net policies is under-explored.
We show that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline.
 arXiv  Detail & Related papers  (2021-07-11T07:36:01Z)
- VarGrad: A Low-Variance Gradient Estimator for Variational Inference [9.108412698936105]
 We show that VarGrad offers a favourable variance versus trade-off compared to other state-of-the-art estimators on a discrete VAE.
 arXiv  Detail & Related papers  (2020-10-20T16:46:01Z)
- Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
  Estimator [93.05919133288161]
 We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
 arXiv  Detail & Related papers  (2020-10-09T22:54:38Z)
- A Study of Gradient Variance in Deep Learning [56.437755740715396]
 We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling.
We measure the gradient variance on common deep learning benchmarks and observe that, contrary to common assumptions, gradient variance increases during training.
 arXiv  Detail & Related papers  (2020-07-09T03:23:10Z)
- A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
 A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
 arXiv  Detail & Related papers  (2020-07-08T11:35:47Z)
- Scalable Control Variates for Monte Carlo Methods via Stochastic
  Optimization [62.47170258504037]
 This paper presents a framework that encompasses and generalizes existing approaches that use controls, kernels and neural networks.
Novel theoretical results are presented to provide insight into the variance reduction that can be achieved, and an empirical assessment, including applications to Bayesian inference, is provided in support.
 arXiv  Detail & Related papers  (2020-06-12T22:03:25Z)
- Estimating Gradients for Discrete Random Variables by Sampling without
  Replacement [93.09326095997336]
 We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
 arXiv  Detail & Related papers  (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.