Rao-Blackwellised Reparameterisation Gradients
- URL: http://arxiv.org/abs/2506.07687v1
- Date: Mon, 09 Jun 2025 12:17:19 GMT
- Title: Rao-Blackwellised Reparameterisation Gradients
- Authors: Kevin Lam, Thang Bui, George Deligiannidis, Yee Whye Teh,
- Abstract summary: gradient estimators are the machinery that facilitates gradient-based optimisation for models with latent Gaussian variables.<n>We propose the R2-G2 estimator as the Rao-Blackwellisation of the re parameterisation gradient estimator.<n>We show that initial training with R2-G2 consistently yields better performance in models with multiple applications of the re parameterisation trick.
- Score: 32.130233319282105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Latent Gaussian variables have been popularised in probabilistic machine learning. In turn, gradient estimators are the machinery that facilitates gradient-based optimisation for models with latent Gaussian variables. The reparameterisation trick is often used as the default estimator as it is simple to implement and yields low-variance gradients for variational inference. In this work, we propose the R2-G2 estimator as the Rao-Blackwellisation of the reparameterisation gradient estimator. Interestingly, we show that the local reparameterisation gradient estimator for Bayesian MLPs is an instance of the R2-G2 estimator and Rao-Blackwellisation. This lets us extend benefits of Rao-Blackwellised gradients to a suite of probabilistic models. We show that initial training with R2-G2 consistently yields better performance in models with multiple applications of the reparameterisation trick.
Related papers
- Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Gradient Estimation for Binary Latent Variables via Gradient Variance
Clipping [6.234350105794441]
gradient estimation is often necessary for fitting generative models with discrete latent variables.
DisARM and other estimators have potentially exploding variance near the boundary of the parameter space.
We propose a new gradient estimator textitbitflip-1 that has lower variance at the boundaries of the parameter space.
arXiv Detail & Related papers (2022-08-12T05:37:52Z) - Coupled Gradient Estimators for Discrete Latent Variables [41.428359609999326]
Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators.
We introduce a novel derivation of their estimator based on importance sampling and statistical couplings.
We show that our proposed categorical gradient estimators provide state-of-the-art performance.
arXiv Detail & Related papers (2021-06-15T11:28:44Z) - A unified view of likelihood ratio and reparameterization gradients [91.4645013545015]
We use a first principles approach to explain that LR and RP are alternative methods of keeping track of the movement of probability mass.
We show that the space of all possible estimators combining LR and RP can be completely parameterized by a flow field.
We prove that there cannot exist a single-sample estimator of this type outside our space, thus, clarifying where we should be searching for better Monte Carlo gradient estimators.
arXiv Detail & Related papers (2021-05-31T11:53:08Z) - Generalized Doubly Reparameterized Gradient Estimators [18.253352549048564]
We develop two generalizations of the DReGs estimator and show that they can be used to train conditional and hierarchical VAEs on image modelling tasks more effectively.
We first extend the estimator to hierarchical models with several layers by showing how to treat additional score function terms due to the hierarchical variational posterior.
We then generalize DReGs to score functions of arbitrary distributions instead of just those of the sampling distribution, which makes the estimator applicable to the parameters of the prior in addition to those of the posterior.
arXiv Detail & Related papers (2021-01-26T19:30:00Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled
Markov Chains [34.77971292478243]
The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture.
We develop a training scheme for VAEs by introducing unbiased estimators of the log-likelihood gradient.
We show experimentally that VAEs fitted with unbiased estimators exhibit better predictive performance.
arXiv Detail & Related papers (2020-10-05T08:11:55Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.