Gradient Estimation for Binary Latent Variables via Gradient Variance
Clipping
- URL: http://arxiv.org/abs/2208.06124v1
- Date: Fri, 12 Aug 2022 05:37:52 GMT
- Title: Gradient Estimation for Binary Latent Variables via Gradient Variance
Clipping
- Authors: Russell Z. Kunes, Mingzhang Yin, Max Land, Doron Haviv, Dana Pe'er,
Simon Tavar\'e
- Abstract summary: gradient estimation is often necessary for fitting generative models with discrete latent variables.
DisARM and other estimators have potentially exploding variance near the boundary of the parameter space.
We propose a new gradient estimator textitbitflip-1 that has lower variance at the boundaries of the parameter space.
- Score: 6.234350105794441
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient estimation is often necessary for fitting generative models with
discrete latent variables, in contexts such as reinforcement learning and
variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020;
Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for
Bernoulli latent variable models in many contexts. However, DisARM and other
estimators have potentially exploding variance near the boundary of the
parameter space, where solutions tend to lie. To ameliorate this issue, we
propose a new gradient estimator \textit{bitflip}-1 that has lower variance at
the boundaries of the parameter space. As bitflip-1 has complementary
properties to existing estimators, we introduce an aggregated estimator,
\textit{unbiased gradient variance clipping} (UGC) that uses either a bitflip-1
or a DisARM gradient update for each coordinate. We theoretically prove that
UGC has uniformly lower variance than DisARM. Empirically, we observe that UGC
achieves the optimal value of the optimization objectives in toy experiments,
discrete VAE training, and in a best subset selection problem.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach [11.878350833222711]
We propose a method called em GradSamp for sampling gradient updates from a Gaussian distribution.
em GradSamp not only streamlines gradient but also enables skipping entire epochs, thereby enhancing overall efficiency.
We rigorously validate our hypothesis across a diverse set of standard and non-standard CNN and transformer-based models.
arXiv Detail & Related papers (2024-06-11T15:01:20Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Sampling in Constrained Domains with Orthogonal-Space Variational
Gradient Descent [13.724361914659438]
We propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold.
We prove that O-Gradient converges to the target constrained distribution with rate $widetildeO (1/textthe number of iterations)$ under mild conditions.
arXiv Detail & Related papers (2022-10-12T17:51:13Z) - Adaptive Perturbation-Based Gradient Estimation for Discrete Latent
Variable Models [28.011868604717726]
We present Adaptive IMLE, the first adaptive gradient estimator for complex discrete distributions.
We show that our estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
arXiv Detail & Related papers (2022-09-11T13:32:39Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z) - Double Control Variates for Gradient Estimation in Discrete Latent
Variable Models [32.33171301923846]
We introduce a variance reduction technique for score function estimators.
We show that our estimator can have lower variance compared to other state-of-the-art estimators.
arXiv Detail & Related papers (2021-11-09T18:02:42Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.