Path-Gradient Estimators for Continuous Normalizing Flows
- URL: http://arxiv.org/abs/2206.09016v1
- Date: Fri, 17 Jun 2022 21:25:06 GMT
- Title: Path-Gradient Estimators for Continuous Normalizing Flows
- Authors: Lorenz Vaitl, Kim A. Nicoli, Shinichi Nakajima, Pan Kessel
- Abstract summary: Recent work has established a path-gradient estimator for simple variational Gaussian distributions.
We propose a path-gradient estimator for the considerably more expressive variational family of continuous normalizing flows.
- Score: 4.830811539001643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has established a path-gradient estimator for simple variational
Gaussian distributions and has argued that the path-gradient is particularly
beneficial in the regime in which the variational distribution approaches the
exact target distribution. In many applications, this regime can however not be
reached by a simple Gaussian variational distribution. In this work, we
overcome this crucial limitation by proposing a path-gradient estimator for the
considerably more expressive variational family of continuous normalizing
flows. We outline an efficient algorithm to calculate this estimator and
establish its superior performance empirically.
Related papers
- Pathwise Gradient Variance Reduction with Control Variates in Variational Inference [2.1638817206926855]
Variational inference in Bayesian deep learning often involves computing the gradient of an expectation that lacks a closed-form solution.
In these cases, pathwise and score-function gradient estimators are the most common approaches.
Recent research suggests that even pathwise gradient estimators could benefit from variance reduction.
arXiv Detail & Related papers (2024-10-08T07:28:46Z) - Fast and Unified Path Gradient Estimators for Normalizing Flows [5.64979077798699]
path gradient estimators for normalizing flows have lower variance compared to standard estimators for variational inference.
We propose a fast path gradient estimator which improves computational efficiency significantly.
We empirically establish its superior performance and reduced variance for several natural sciences applications.
arXiv Detail & Related papers (2024-03-23T16:21:22Z) - Uncertainty Quantification via Stable Distribution Propagation [60.065272548502]
We propose a new approach for propagating stable probability distributions through neural networks.
Our method is based on local linearization, which we show to be an optimal approximation in terms of total variation distance for the ReLU non-linearity.
arXiv Detail & Related papers (2024-02-13T09:40:19Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Gradients should stay on Path: Better Estimators of the Reverse- and
Forward KL Divergence for Normalizing Flows [4.830811539001643]
We propose an algorithm to estimate the path-gradient of both the reverse and forward Kullback-Leibler divergence for an arbitrary manifestly invertible normalizing flow.
The resulting path-gradient estimators are straightforward to implement, have lower variance, and lead not only to faster convergence of training but also to better overall approximation results.
arXiv Detail & Related papers (2022-07-17T16:27:41Z) - A Dimensionality Reduction Method for Finding Least Favorable Priors
with a Focus on Bregman Divergence [108.28566246421742]
This paper develops a dimensionality reduction method that allows us to move the optimization to a finite-dimensional setting with an explicit bound on the dimension.
In order to make progress on the problem, we restrict ourselves to Bayesian risks induced by a relatively large class of loss functions, namely Bregman divergences.
arXiv Detail & Related papers (2022-02-23T16:22:28Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Unbiased Gradient Estimation for Distributionally Robust Learning [2.1777837784979277]
We consider a new approach based on distributionally robust learning (DRL) that applies gradient descent to the inner problem.
Our algorithm efficiently estimates gradient gradient through multi-level Monte Carlo randomization.
arXiv Detail & Related papers (2020-12-22T21:35:03Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - Variance Regularization for Accelerating Stochastic Optimization [14.545770519120898]
We propose a universal principle which reduces the random error accumulation by exploiting statistic information hidden in mini-batch gradients.
This is achieved by regularizing the learning-rate according to mini-batch variances.
arXiv Detail & Related papers (2020-08-13T15:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.