Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators
- URL: http://arxiv.org/abs/2110.03549v1
- Date: Thu, 7 Oct 2021 15:16:07 GMT
- Title: Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators
- Authors: Alexander Shekhovtsov
- Abstract summary: Straight-through (ST) estimator gained popularity due to its simplicity and efficiency.
Several techniques were proposed to improve over ST while keeping the same low computational complexity.
We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
- Score: 100.58924375509659
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Discrete and especially binary random variables occur in many machine
learning models, notably in variational autoencoders with binary latent states
and in stochastic binary networks. When learning such models, a key tool is an
estimator of the gradient of the expected loss with respect to the
probabilities of binary variables. The straight-through (ST) estimator gained
popularity due to its simplicity and efficiency, in particular in deep networks
where unbiased estimators are impractical. Several techniques were proposed to
improve over ST while keeping the same low computational complexity:
Gumbel-Softmax, ST-Gumbel-Softmax, BayesBiNN, FouST. We conduct a theoretical
analysis of Bias and Variance of these methods in order to understand tradeoffs
and verify the originally claimed properties. The presented theoretical results
are mainly negative, showing limitations of these methods and in some cases
revealing serious issues.
Related papers
- Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Bayesian Deep Learning for Remaining Useful Life Estimation via Stein
Variational Gradient Descent [14.784809634505903]
We show that Bayesian deep learning models trained via Stein variational gradient descent consistently outperform with respect to convergence speed and predictive performance.
We propose a method to enhance performance based on the uncertainty information provided by the Bayesian models.
arXiv Detail & Related papers (2024-02-02T02:21:06Z) - Learning a Gaussian Mixture for Sparsity Regularization in Inverse
Problems [2.375943263571389]
In inverse problems, the incorporation of a sparsity prior yields a regularization effect on the solution.
We propose a probabilistic sparsity prior formulated as a mixture of Gaussians, capable of modeling sparsity with respect to a generic basis.
We put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network.
arXiv Detail & Related papers (2024-01-29T22:52:57Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Single Model Uncertainty Estimation via Stochastic Data Centering [39.71621297447397]
We are interested in estimating the uncertainties of deep neural networks.
We present a striking new finding that an ensemble of neural networks with the same weight initialization, trained on datasets that are shifted by a constant bias gives rise to slightly inconsistent trained models.
We show that $Delta-$UQ's uncertainty estimates are superior to many of the current methods on a variety of benchmarks.
arXiv Detail & Related papers (2022-07-14T23:54:54Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Expectation propagation on the diluted Bayesian classifier [0.0]
We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature selection in the context of binary classification.
A computational scheme known as expectation propagation (EP) is used to train a continuous-weights perceptron learning a classification rule.
EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity.
arXiv Detail & Related papers (2020-09-20T23:59:44Z) - Reintroducing Straight-Through Estimators as Principled Methods for
Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights.
Many successful experimental results have been achieved with empirical straight-through (ST) approaches.
At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.