Generalized Gumbel-Softmax Gradient Estimator for Various Discrete
Random Variables
- URL: http://arxiv.org/abs/2003.01847v2
- Date: Tue, 9 Jun 2020 10:38:58 GMT
- Title: Generalized Gumbel-Softmax Gradient Estimator for Various Discrete
Random Variables
- Authors: Weonyoung Joo, Dongjun Kim, Seungjae Shin, Il-Chul Moon
- Abstract summary: Esting the gradients of nodes is one of the crucial research questions in the deep generative modeling community.
This paper proposes a general version of the Gumbel-Softmax estimator with continuous relaxation.
- Score: 16.643346012854156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the gradients of stochastic nodes is one of the crucial research
questions in the deep generative modeling community, which enables the gradient
descent optimization on neural network parameters. This estimation problem
becomes further complex when we regard the stochastic nodes to be discrete
because pathwise derivative techniques cannot be applied. Hence, the stochastic
gradient estimation of discrete distributions requires either a score function
method or continuous relaxation of the discrete random variables. This paper
proposes a general version of the Gumbel-Softmax estimator with continuous
relaxation, and this estimator is able to relax the discreteness of probability
distributions including more diverse types, other than categorical and
Bernoulli. In detail, we utilize the truncation of discrete random variables
and the Gumbel-Softmax trick with a linear transformation for the relaxed
reparameterization. The proposed approach enables the relaxed discrete random
variable to be reparameterized and to backpropagated through a large scale
stochastic computational graph. Our experiments consist of (1) synthetic data
analyses, which show the efficacy of our methods; and (2) applications on VAE
and topic model, which demonstrate the value of the proposed estimation in
practices.
Related papers
- Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation [59.86921150579892]
We deal with the problem of gradient estimation for differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions.
We develop variance reduction strategies for differentiable sorting and ranking, differentiable shortest-paths on graphs, differentiable rendering for pose estimation, as well as differentiable cryo-ET simulations.
arXiv Detail & Related papers (2024-10-10T17:10:00Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Robust scalable initialization for Bayesian variational inference with
multi-modal Laplace approximations [0.0]
Variational mixtures with full-covariance structures suffer from a quadratic growth due to variational parameters with the number of parameters.
We propose a method for constructing an initial Gaussian model approximation that can be used to warm-start variational inference.
arXiv Detail & Related papers (2023-07-12T19:30:04Z) - A probabilistic, data-driven closure model for RANS simulations with aleatoric, model uncertainty [1.8416014644193066]
We propose a data-driven, closure model for Reynolds-averaged Navier-Stokes (RANS) simulations that incorporates aleatoric, model uncertainty.
A fully Bayesian formulation is proposed, combined with a sparsity-inducing prior in order to identify regions in the problem domain where the parametric closure is insufficient.
arXiv Detail & Related papers (2023-07-05T16:53:31Z) - Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals.
We treat the perturbations as random variables endowed with prior distribution functions.
A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Reliable Categorical Variational Inference with Mixture of Discrete
Normalizing Flows [10.406659081400354]
Variational approximations are increasingly based on gradient-based optimization of expectations estimated by sampling.
Continuous relaxations, such as the Gumbel-Softmax for categorical distribution, enable gradient-based optimization, but do not define a valid probability mass for discrete observations.
In practice, selecting the amount of relaxation is difficult and one needs to optimize an objective that does not align with the desired one.
arXiv Detail & Related papers (2020-06-28T10:39:39Z) - Nearest Neighbour Based Estimates of Gradients: Sharp Nonasymptotic
Bounds and Applications [0.6445605125467573]
gradient estimation is of crucial importance in statistics and learning theory.
We consider here the classic regression setup, where a real valued square integrable r.v. $Y$ is to be predicted.
We prove nonasymptotic bounds improving upon those obtained for alternative estimation methods.
arXiv Detail & Related papers (2020-06-26T15:19:43Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.