Related papers: Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators

Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators

URL: http://arxiv.org/abs/2110.03549v1
Date: Thu, 7 Oct 2021 15:16:07 GMT
Title: Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators
Authors: Alexander Shekhovtsov
Abstract summary: Straight-through (ST) estimator gained popularity due to its simplicity and efficiency. Several techniques were proposed to improve over ST while keeping the same low computational complexity. We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
Score: 100.58924375509659
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simplicity and efficiency, in particular in deep networks where unbiased estimators are impractical. Several techniques were proposed to improve over ST while keeping the same low computational complexity: Gumbel-Softmax, ST-Gumbel-Softmax, BayesBiNN, FouST. We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify the originally claimed properties. The presented theoretical results are mainly negative, showing limitations of these methods and in some cases revealing serious issues.

Related papers

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments [5.5855749614100825]
This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction.<n>We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem.<n>Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.
arXiv Detail & Related papers (2025-05-25T23:17:47Z)
Preconditioned Inexact Stochastic ADMM for Deep Model [35.37705488695026]
This paper develops an algorithm, PISA, which enables scalable parallel computing and supports various second-moment schemes. Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz of the gradient. Comprehensive experimental evaluations for or fine-tuning diverse FMs, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate its superior numerical performance compared to various state-of-the-art Directions.
arXiv Detail & Related papers (2025-02-15T12:28:51Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
Bayesian Deep Learning for Remaining Useful Life Estimation via Stein Variational Gradient Descent [14.784809634505903]
We show that Bayesian deep learning models trained via Stein variational gradient descent consistently outperform with respect to convergence speed and predictive performance. We propose a method to enhance performance based on the uncertainty information provided by the Bayesian models.
arXiv Detail & Related papers (2024-02-02T02:21:06Z)
Learning a Gaussian Mixture for Sparsity Regularization in Inverse Problems [2.375943263571389]
In inverse problems, the incorporation of a sparsity prior yields a regularization effect on the solution. We propose a probabilistic sparsity prior formulated as a mixture of Gaussians, capable of modeling sparsity with respect to a generic basis. We put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network.
arXiv Detail & Related papers (2024-01-29T22:52:57Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Single Model Uncertainty Estimation via Stochastic Data Centering [39.71621297447397]
We are interested in estimating the uncertainties of deep neural networks. We present a striking new finding that an ensemble of neural networks with the same weight initialization, trained on datasets that are shifted by a constant bias gives rise to slightly inconsistent trained models. We show that $Delta-$UQ's uncertainty estimates are superior to many of the current methods on a variety of benchmarks.
arXiv Detail & Related papers (2022-07-14T23:54:54Z)
Training Discrete Deep Generative Models via Gapped Straight-Through Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead. This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax. Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Expectation propagation on the diluted Bayesian classifier [0.0]
We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature selection in the context of binary classification. A computational scheme known as expectation propagation (EP) is used to train a continuous-weights perceptron learning a classification rule. EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity.
arXiv Detail & Related papers (2020-09-20T23:59:44Z)
Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches. At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.