Double Control Variates for Gradient Estimation in Discrete Latent
Variable Models
- URL: http://arxiv.org/abs/2111.05300v1
- Date: Tue, 9 Nov 2021 18:02:42 GMT
- Title: Double Control Variates for Gradient Estimation in Discrete Latent
Variable Models
- Authors: Michalis K. Titsias, Jiaxin Shi
- Abstract summary: We introduce a variance reduction technique for score function estimators.
We show that our estimator can have lower variance compared to other state-of-the-art estimators.
- Score: 32.33171301923846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient-based optimisation for discrete latent variable models is
challenging due to the high variance of gradients. We introduce a variance
reduction technique for score function estimators that makes use of double
control variates. These control variates act on top of a main control variate,
and try to further reduce the variance of the overall estimator. We develop a
double control variate for the REINFORCE leave-one-out estimator using Taylor
expansions. For training discrete latent variable models, such as variational
autoencoders with binary latent variables, our approach adds no extra
computational cost compared to standard training with the REINFORCE
leave-one-out estimator. We apply our method to challenging high-dimensional
toy examples and training variational autoencoders with binary latent
variables. We show that our estimator can have lower variance compared to other
state-of-the-art estimators.
Related papers
- Gradient Estimation for Binary Latent Variables via Gradient Variance
Clipping [6.234350105794441]
gradient estimation is often necessary for fitting generative models with discrete latent variables.
DisARM and other estimators have potentially exploding variance near the boundary of the parameter space.
We propose a new gradient estimator textitbitflip-1 that has lower variance at the boundaries of the parameter space.
arXiv Detail & Related papers (2022-08-12T05:37:52Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - VarGrad: A Low-Variance Gradient Estimator for Variational Inference [9.108412698936105]
We show that VarGrad offers a favourable variance versus trade-off compared to other state-of-the-art estimators on a discrete VAE.
arXiv Detail & Related papers (2020-10-20T16:46:01Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.