Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator
- URL: http://arxiv.org/abs/2206.07235v1
- Date: Wed, 15 Jun 2022 01:46:05 GMT
- Title: Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator
- Authors: Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky, Peter J. Ramadge
- Abstract summary: We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
- Score: 72.71398034617607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep generative models have succeeded in image processing, natural
language processing, and reinforcement learning, training that involves
discrete random variables remains challenging due to the high variance of its
gradient estimation process. Monte Carlo is a common solution used in most
variance reduction approaches. However, this involves time-consuming resampling
and multiple function evaluations. We propose a Gapped Straight-Through (GST)
estimator to reduce the variance without incurring resampling overhead. This
estimator is inspired by the essential properties of Straight-Through
Gumbel-Softmax. We determine these properties and show via an ablation study
that they are essential. Experiments demonstrate that the proposed GST
estimator enjoys better performance compared to strong baselines on two
discrete deep generative modeling tasks, MNIST-VAE and ListOps.
Related papers
- Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues.
We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution.
Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z) - Custom Gradient Estimators are Straight-Through Estimators in Disguise [3.1037083241174197]
Quantization-aware training comes with a fundamental challenge: the derivative of quantization functions such as rounding are zero almost everywhere.
We prove that when the learning rate is sufficiently small, a large class of weight gradient estimators is equivalent with the straight through estimator (STE)
We experimentally show that these results hold for both a small convolutional model trained on the MNIST dataset and for a ResNet50 model trained on ImageNet.
arXiv Detail & Related papers (2024-05-08T16:07:56Z) - Bayesian Deep Learning for Remaining Useful Life Estimation via Stein
Variational Gradient Descent [14.784809634505903]
We show that Bayesian deep learning models trained via Stein variational gradient descent consistently outperform with respect to convergence speed and predictive performance.
We propose a method to enhance performance based on the uncertainty information provided by the Bayesian models.
arXiv Detail & Related papers (2024-02-02T02:21:06Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators [100.58924375509659]
Straight-through (ST) estimator gained popularity due to its simplicity and efficiency.
Several techniques were proposed to improve over ST while keeping the same low computational complexity.
We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
arXiv Detail & Related papers (2021-10-07T15:16:07Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Semi-supervised Sequential Generative Models [16.23492955875404]
We introduce a novel objective for training deep generative time-series models with discrete latent variables for which supervision is only sparsely available.
We first overcome this problem by extending the standard semi-supervised generative modeling objective with reweighted wake-sleep.
Finally, we introduce a unified objective inspired by teacher-forcing and show that this approach is robust to variable length supervision.
arXiv Detail & Related papers (2020-06-30T23:53:12Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.