Stable Training of Normalizing Flows for High-dimensional Variational
Inference
- URL: http://arxiv.org/abs/2402.16408v1
- Date: Mon, 26 Feb 2024 09:04:07 GMT
- Title: Stable Training of Normalizing Flows for High-dimensional Variational
Inference
- Authors: Daniel Andrade
- Abstract summary: Variational inference with normalizing flows (NFs) is an increasingly popular alternative to MCMC methods.
In practice, training deep normalizing flows for approximating high-dimensional distributions is often infeasible due to the high variance of the gradients.
We show that previous methods for stabilizing the variance of gradient descent can be insufficient to achieve stable training of Real NVPs.
- Score: 2.139348034155473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Variational inference with normalizing flows (NFs) is an increasingly popular
alternative to MCMC methods. In particular, NFs based on coupling layers (Real
NVPs) are frequently used due to their good empirical performance. In theory,
increasing the depth of normalizing flows should lead to more accurate
posterior approximations. However, in practice, training deep normalizing flows
for approximating high-dimensional posterior distributions is often infeasible
due to the high variance of the stochastic gradients. In this work, we show
that previous methods for stabilizing the variance of stochastic gradient
descent can be insufficient to achieve stable training of Real NVPs. As the
source of the problem, we identify that, during training, samples often exhibit
unusual high values. As a remedy, we propose a combination of two methods: (1)
soft-thresholding of the scale in Real NVPs, and (2) a bijective soft log
transformation of the samples. We evaluate these and other previously proposed
modification on several challenging target distributions, including a
high-dimensional horseshoe logistic regression model. Our experiments show that
with our modifications, stable training of Real NVPs for posteriors with
several thousand dimensions is possible, allowing for more accurate marginal
likelihood estimation via importance sampling. Moreover, we evaluate several
common training techniques and architecture choices and provide practical
advise for training NFs for high-dimensional variational inference.
Related papers
- In-Context Parametric Inference: Point or Distribution Estimators? [66.22308335324239]
We show that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
arXiv Detail & Related papers (2025-02-17T10:00:24Z) - Importance Corrected Neural JKO Sampling [0.0]
We combine continuous normalizing flows (CNFs) with rejection-resampling steps based on importance weights.
The arising model can be trained iteratively, reduces the reverse Kulback-Leibler (KL) loss function in each step and allows to generate iid samples.
Numerical examples show that our method yields accurate results on various test distributions including high-dimensional multimodal targets.
arXiv Detail & Related papers (2024-07-29T22:49:59Z) - SoftCVI: Contrastive variational inference with self-generated soft labels [2.5398014196797614]
Variational inference and Markov chain Monte Carlo methods are the predominant tools for this task.
We introduce Soft Contrastive Variational Inference (SoftCVI), which allows a family of variational objectives to be derived through a contrastive estimation framework.
We find that SoftCVI can be used to form objectives which are stable to train and mass-covering, frequently outperforming inference with other variational approaches.
arXiv Detail & Related papers (2024-07-22T14:54:12Z) - Differentiable and Stable Long-Range Tracking of Multiple Posterior Modes [1.534667887016089]
We leverage training data to discriminatively learn particle-based representations of uncertainty in latent object states.
Our approach achieves dramatic improvements in accuracy, while also showing much greater stability across multiple training runs.
arXiv Detail & Related papers (2024-04-12T19:33:52Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Preconditioned training of normalizing flows for variational inference
in inverse problems [1.5749416770494706]
We propose a conditional normalizing flow (NF) capable of sampling from a low-fidelity posterior distribution directly.
This conditional NF is used to speed up the training of the high-fidelity objective involving minimization of the Kullback-Leibler divergence.
Our numerical experiments, including a 2D toy and a seismic compressed sensing example, demonstrate that thanks to the preconditioning considerable speed-ups are achievable.
arXiv Detail & Related papers (2021-01-11T05:35:36Z) - On Signal-to-Noise Ratio Issues in Variational Inference for Deep
Gaussian Processes [55.62520135103578]
We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues.
We show that our fix can lead to consistent improvements in the predictive performance of DGP models.
arXiv Detail & Related papers (2020-11-01T14:38:02Z) - Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation [51.091890311312085]
We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
arXiv Detail & Related papers (2020-07-12T15:31:24Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference.
We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.