Coupled Gradient Estimators for Discrete Latent Variables
- URL: http://arxiv.org/abs/2106.08056v1
- Date: Tue, 15 Jun 2021 11:28:44 GMT
- Title: Coupled Gradient Estimators for Discrete Latent Variables
- Authors: Zhe Dong, Andriy Mnih, George Tucker
- Abstract summary: Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators.
We introduce a novel derivation of their estimator based on importance sampling and statistical couplings.
We show that our proposed categorical gradient estimators provide state-of-the-art performance.
- Score: 41.428359609999326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training models with discrete latent variables is challenging due to the high
variance of unbiased gradient estimators. While low-variance reparameterization
gradients of a continuous relaxation can provide an effective solution, a
continuous relaxation is not always available or tractable. Dong et al. (2020)
and Yin et al. (2020) introduced a performant estimator that does not rely on
continuous relaxations; however, it is limited to binary random variables. We
introduce a novel derivation of their estimator based on importance sampling
and statistical couplings, which we extend to the categorical setting.
Motivated by the construction of a stick-breaking coupling, we introduce
gradient estimators based on reparameterizing categorical variables as
sequences of binary variables and Rao-Blackwellization. In systematic
experiments, we show that our proposed categorical gradient estimators provide
state-of-the-art performance, whereas even with additional
Rao-Blackwellization, previous estimators (Yin et al., 2019) underperform a
simpler REINFORCE with a leave-one-out-baseline estimator (Kool et al., 2019).
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Adaptive Perturbation-Based Gradient Estimation for Discrete Latent
Variable Models [28.011868604717726]
We present Adaptive IMLE, the first adaptive gradient estimator for complex discrete distributions.
We show that our estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
arXiv Detail & Related papers (2022-09-11T13:32:39Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models.
We provide a unifying view of these estimators under the framework of regularized nonparametric regression.
We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z) - Generalized Gumbel-Softmax Gradient Estimator for Various Discrete
Random Variables [16.643346012854156]
Esting the gradients of nodes is one of the crucial research questions in the deep generative modeling community.
This paper proposes a general version of the Gumbel-Softmax estimator with continuous relaxation.
arXiv Detail & Related papers (2020-03-04T01:13:15Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.