Smoothness Analysis for Probabilistic Programs with Application to
Optimised Variational Inference
- URL: http://arxiv.org/abs/2208.10530v1
- Date: Mon, 22 Aug 2022 18:18:32 GMT
- Title: Smoothness Analysis for Probabilistic Programs with Application to
Optimised Variational Inference
- Authors: Wonyeol Lee, Xavier Rival, Hongseok Yang
- Abstract summary: We present a static analysis for discovering differentiable or more generally smooth parts of a given probabilistic program.
We show how the analysis can be used to improve the pathwise gradient estimator.
- Score: 13.836565669337057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a static analysis for discovering differentiable or more generally
smooth parts of a given probabilistic program, and show how the analysis can be
used to improve the pathwise gradient estimator, one of the most popular
methods for posterior inference and model learning. Our improvement increases
the scope of the estimator from differentiable models to non-differentiable
ones without requiring manual intervention of the user; the improved estimator
automatically identifies differentiable parts of a given probabilistic program
using our static analysis, and applies the pathwise gradient estimator to the
identified parts while using a more general but less efficient estimator,
called score estimator, for the rest of the program. Our analysis has a
surprisingly subtle soundness argument, partly due to the misbehaviours of some
target smoothness properties when viewed from the perspective of program
analysis designers. For instance, some smoothness properties are not preserved
by function composition, and this makes it difficult to analyse sequential
composition soundly without heavily sacrificing precision. We formulate five
assumptions on a target smoothness property, prove the soundness of our
analysis under those assumptions, and show that our leading examples satisfy
these assumptions. We also show that by using information from our analysis,
our improved gradient estimator satisfies an important differentiability
requirement and thus, under a mild regularity condition, computes the correct
estimate on average, i.e., it returns an unbiased estimate. Our experiments
with representative probabilistic programs in the Pyro language show that our
static analysis is capable of identifying smooth parts of those programs
accurately, and making our improved pathwise gradient estimator exploit all the
opportunities for high performance in those programs.
Related papers
- Beyond Single-Model Views for Deep Learning: Optimization versus
Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants.
We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations.
Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z) - Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [0.8192907805418583]
We show that biased gradients converge to critical points for smooth non- functions.
We show how the effect of bias can be reduced by appropriate tuning.
arXiv Detail & Related papers (2024-02-05T10:17:36Z) - Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling [0.6906005491572401]
The paper provides theoretical insights on why large batch sizes fall into sharp local minima, why decaying learning rates and increasing batch sizes are superior to fixed learning rates, and what the optimal learning rate is.
arXiv Detail & Related papers (2023-11-15T07:27:40Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Variance Reduction for Score Functions Using Optimal Baselines [0.0]
This paper studies baselines, a variance reduction technique for score functions.
Motivated primarily by reinforcement learning, we derive for the first time an expression for the optimal state-dependent baseline.
arXiv Detail & Related papers (2022-12-27T19:17:28Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models.
We provide a unifying view of these estimators under the framework of regularized nonparametric regression.
We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.