Related papers: Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference

Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference

URL: http://arxiv.org/abs/2208.10530v1
Date: Mon, 22 Aug 2022 18:18:32 GMT
Title: Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference
Authors: Wonyeol Lee, Xavier Rival, Hongseok Yang
Abstract summary: We present a static analysis for discovering differentiable or more generally smooth parts of a given probabilistic program. We show how the analysis can be used to improve the pathwise gradient estimator.
Score: 13.836565669337057
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a static analysis for discovering differentiable or more generally smooth parts of a given probabilistic program, and show how the analysis can be used to improve the pathwise gradient estimator, one of the most popular methods for posterior inference and model learning. Our improvement increases the scope of the estimator from differentiable models to non-differentiable ones without requiring manual intervention of the user; the improved estimator automatically identifies differentiable parts of a given probabilistic program using our static analysis, and applies the pathwise gradient estimator to the identified parts while using a more general but less efficient estimator, called score estimator, for the rest of the program. Our analysis has a surprisingly subtle soundness argument, partly due to the misbehaviours of some target smoothness properties when viewed from the perspective of program analysis designers. For instance, some smoothness properties are not preserved by function composition, and this makes it difficult to analyse sequential composition soundly without heavily sacrificing precision. We formulate five assumptions on a target smoothness property, prove the soundness of our analysis under those assumptions, and show that our leading examples satisfy these assumptions. We also show that by using information from our analysis, our improved gradient estimator satisfies an important differentiability requirement and thus, under a mild regularity condition, computes the correct estimate on average, i.e., it returns an unbiased estimate. Our experiments with representative probabilistic programs in the Pyro language show that our static analysis is capable of identifying smooth parts of those programs accurately, and making our improved pathwise gradient estimator exploit all the opportunities for high performance in those programs.

Related papers

Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants. We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations. Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z)
Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [0.8192907805418583]
We show that biased gradients converge to critical points for smooth non- functions. We show how the effect of bias can be reduced by appropriate tuning.
arXiv Detail & Related papers (2024-02-05T10:17:36Z)
Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling [0.6906005491572401]
The paper provides theoretical insights on why large batch sizes fall into sharp local minima, why decaying learning rates and increasing batch sizes are superior to fixed learning rates, and what the optimal learning rate is.
arXiv Detail & Related papers (2023-11-15T07:27:40Z)
A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables. We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler. Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Variance Reduction for Score Functions Using Optimal Baselines [0.0]
This paper studies baselines, a variance reduction technique for score functions. Motivated primarily by reinforcement learning, we derive for the first time an expression for the optimal state-dependent baseline.
arXiv Detail & Related papers (2022-12-27T19:17:28Z)
Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing. We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models. We provide a unifying view of these estimators under the framework of regularized nonparametric regression. We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.