Adaptive Perturbation-Based Gradient Estimation for Discrete Latent
Variable Models
- URL: http://arxiv.org/abs/2209.04862v1
- Date: Sun, 11 Sep 2022 13:32:39 GMT
- Title: Adaptive Perturbation-Based Gradient Estimation for Discrete Latent
Variable Models
- Authors: Pasquale Minervini, Luca Franceschi, Mathias Niepert
- Abstract summary: We present Adaptive IMLE, the first adaptive gradient estimator for complex discrete distributions.
We show that our estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
- Score: 28.011868604717726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of discrete algorithmic components in deep learning
architectures has numerous applications. Recently, Implicit Maximum Likelihood
Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient
estimators for discrete exponential family distributions, was proposed by
combining implicit differentiation through perturbation with the path-wise
gradient estimator. However, due to the finite difference approximation of the
gradients, it is especially sensitive to the choice of the finite difference
step size which needs to be specified by the user. In this work, we present
Adaptive IMLE (AIMLE) the first adaptive gradient estimator for complex
discrete distributions: it adaptively identifies the target distribution for
IMLE by trading off the density of gradient information with the degree of bias
in the gradient estimates. We empirically evaluate our estimator on synthetic
examples, as well as on Learning to Explain, Discrete Variational
Auto-Encoders, and Neural Relational Inference tasks. In our experiments, we
show that our adaptive gradient estimator can produce faithful estimates while
requiring orders of magnitude fewer samples than other gradient estimators.
Related papers
- Analytical Approximation of the ELBO Gradient in the Context of the Clutter Problem [0.0]
We propose an analytical solution for approximating the gradient of the Evidence Lower Bound (ELBO) in variational inference problems.
The proposed method demonstrates good accuracy and rate of convergence together with linear computational complexity.
arXiv Detail & Related papers (2024-04-16T13:19:46Z) - Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [0.8192907805418583]
We show that biased gradients converge to critical points for smooth non- functions.
We show how the effect of bias can be reduced by appropriate tuning.
arXiv Detail & Related papers (2024-02-05T10:17:36Z) - Differentiating Metropolis-Hastings to Optimize Intractable Densities [51.16801956665228]
We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers.
We apply gradient-based optimization to objectives expressed as expectations over intractable target densities.
arXiv Detail & Related papers (2023-06-13T17:56:02Z) - Preferential Subsampling for Stochastic Gradient Langevin Dynamics [3.158346511479111]
gradient MCMC offers an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data.
The resulting gradient estimator may exhibit a high variance and impact sampler performance.
We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.
arXiv Detail & Related papers (2022-10-28T14:56:18Z) - Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Storchastic: A Framework for General Stochastic Automatic
Differentiation [9.34612743192798]
We introduce Storchastic, a new framework for automatic differentiation of graphs.
Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step.
Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates.
arXiv Detail & Related papers (2021-04-01T12:19:54Z) - A Study of Gradient Variance in Deep Learning [56.437755740715396]
We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling.
We measure the gradient variance on common deep learning benchmarks and observe that, contrary to common assumptions, gradient variance increases during training.
arXiv Detail & Related papers (2020-07-09T03:23:10Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.