Storchastic: A Framework for General Stochastic Automatic
Differentiation
- URL: http://arxiv.org/abs/2104.00428v1
- Date: Thu, 1 Apr 2021 12:19:54 GMT
- Title: Storchastic: A Framework for General Stochastic Automatic
Differentiation
- Authors: Emile van Krieken, Jakub M. Tomczak, Annette ten Teije
- Abstract summary: We introduce Storchastic, a new framework for automatic differentiation of graphs.
Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step.
Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates.
- Score: 9.34612743192798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modelers use automatic differentiation of computation graphs to implement
complex Deep Learning models without defining gradient computations. However,
modelers often use sampling methods to estimate intractable expectations such
as in Reinforcement Learning and Variational Inference. Current methods for
estimating gradients through these sampling steps are limited: They are either
only applicable to continuous random variables and differentiable functions, or
can only use simple but high variance score-function estimators.
To overcome these limitations, we introduce Storchastic, a new framework for
automatic differentiation of stochastic computation graphs. Storchastic allows
the modeler to choose from a wide variety of gradient estimation methods at
each sampling step, to optimally reduce the variance of the gradient estimates.
Furthermore, Storchastic is provably unbiased for estimation of any-order
gradients, and generalizes variance reduction techniques to higher-order
gradient estimates. Finally, we implement Storchastic as a PyTorch library.
Related papers
- One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm.
We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z) - Adaptive Perturbation-Based Gradient Estimation for Discrete Latent
Variable Models [28.011868604717726]
We present Adaptive IMLE, the first adaptive gradient estimator for complex discrete distributions.
We show that our estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
arXiv Detail & Related papers (2022-09-11T13:32:39Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - Generalized Gumbel-Softmax Gradient Estimator for Various Discrete
Random Variables [16.643346012854156]
Esting the gradients of nodes is one of the crucial research questions in the deep generative modeling community.
This paper proposes a general version of the Gumbel-Softmax estimator with continuous relaxation.
arXiv Detail & Related papers (2020-03-04T01:13:15Z) - Non-asymptotic bounds for stochastic optimization with biased noisy
gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error.
Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.