Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator
- URL: http://arxiv.org/abs/2010.04838v1
- Date: Fri, 9 Oct 2020 22:54:38 GMT
- Title: Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator
- Authors: Max B. Paulus, Chris J. Maddison, Andreas Krause
- Abstract summary: We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
- Score: 93.05919133288161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient estimation in models with discrete latent variables is a challenging
problem, because the simplest unbiased estimators tend to have high variance.
To counteract this, modern estimators either introduce bias, rely on multiple
function evaluations, or use learned, input-dependent baselines. Thus, there is
a need for estimators that require minimal tuning, are computationally cheap,
and have low mean squared error. In this paper, we show that the variance of
the straight-through variant of the popular Gumbel-Softmax estimator can be
reduced through Rao-Blackwellization without increasing the number of function
evaluations. This provably reduces the mean squared error. We empirically
demonstrate that this leads to variance reduction, faster convergence, and
generally improved performance in two unsupervised latent variable models.
Related papers
- A Parameter-Free Two-Bit Covariance Estimator with Improved Operator
Norm Error Rate [27.308933056578212]
We propose a new 2-bit covariance matrix estimator that simultaneously addresses both issues.
By employing dithering scales varying across entries, our estimator enjoys an improved operator norm error rate.
Our proposed method eliminates the need of any tuning parameter, as the dithering scales are entirely determined by the data.
arXiv Detail & Related papers (2023-08-30T14:31:24Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z) - Double Control Variates for Gradient Estimation in Discrete Latent
Variable Models [32.33171301923846]
We introduce a variance reduction technique for score function estimators.
We show that our estimator can have lower variance compared to other state-of-the-art estimators.
arXiv Detail & Related papers (2021-11-09T18:02:42Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.