Training normalizing flows with computationally intensive target
probability distributions
- URL: http://arxiv.org/abs/2308.13294v2
- Date: Wed, 28 Feb 2024 15:48:09 GMT
- Title: Training normalizing flows with computationally intensive target
probability distributions
- Authors: Piotr Bialas, Piotr Korcyl, Tomasz Stebel
- Abstract summary: We propose an estimator for normalizing flows based on the REINFORCE algorithm.
It is up to ten times faster in terms of the wall-clock time and requires up to $30%$ less memory.
- Score: 0.018416014644193065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning techniques, in particular the so-called normalizing flows,
are becoming increasingly popular in the context of Monte Carlo simulations as
they can effectively approximate target probability distributions. In the case
of lattice field theories (LFT) the target distribution is given by the
exponential of the action. The common loss function's gradient estimator based
on the "reparametrization trick" requires the calculation of the derivative of
the action with respect to the fields. This can present a significant
computational cost for complicated, non-local actions like e.g. fermionic
action in QCD. In this contribution, we propose an estimator for normalizing
flows based on the REINFORCE algorithm that avoids this issue. We apply it to
two dimensional Schwinger model with Wilson fermions at criticality and show
that it is up to ten times faster in terms of the wall-clock time as well as
requiring up to $30\%$ less memory than the reparameterization trick estimator.
It is also more numerically stable allowing for single precision calculations
and the use of half-float tensor cores. We present an in-depth analysis of the
origins of those improvements. We believe that these benefits will appear also
outside the realm of the LFT, in each case where the target probability
distribution is computationally intensive.
Related papers
- Learning Optimal Flows for Non-Equilibrium Importance Sampling [13.469239537683299]
We develop a method to perform calculations based on generating samples from a simple base distribution, transporting them along the flow generated by a velocity field, and performing averages along these flowlines.
On the theory side we discuss how to tailor the velocity field to the target and establish general conditions under which the proposed estimator is a perfect estimator.
On the computational side we show how to use deep learning to represent the velocity field by a neural network and train it towards the zero variance optimum.
arXiv Detail & Related papers (2022-06-20T17:25:26Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Gradient estimators for normalising flows [0.05156484100374058]
A machine learning approach to Monte-Carlo simulations called Neural Markov Chain Monte-Carlo is gaining traction.
We present another gradient estimator that avoids the calculation, thus potentially speeding up training for models with more complicated actions.
We also study the statistical properties of several gradient estimators and show that our formulation leads to better training results.
arXiv Detail & Related papers (2022-02-02T22:37:58Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer.
This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$.
We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z) - SWIFT: Scalable Wasserstein Factorization for Sparse Nonnegative Tensors [42.154795547748165]
We introduce SWIFT, which minimizes the Wasserstein distance that measures the distance between the input tensor and that of the reconstruction.
SWIFT achieves up to 9.65% and 11.31% relative improvement over baselines for downstream prediction tasks.
arXiv Detail & Related papers (2020-10-08T16:05:59Z) - Resource-efficient adaptive Bayesian tracking of magnetic fields with a
quantum sensor [0.0]
Single-spin quantum sensors provide nanoscale mapping of magnetic fields.
In applications where the magnetic field may be changing rapidly, total sensing time must be minimised.
This article addresses the issue of computational speed by implementing an approximate Bayesian estimation technique.
arXiv Detail & Related papers (2020-08-20T11:04:09Z) - Likelihood-Free Inference with Deep Gaussian Processes [70.74203794847344]
Surrogate models have been successfully used in likelihood-free inference to decrease the number of simulator evaluations.
We propose a Deep Gaussian Process (DGP) surrogate model that can handle more irregularly behaved target distributions.
Our experiments show how DGPs can outperform GPs on objective functions with multimodal distributions and maintain a comparable performance in unimodal cases.
arXiv Detail & Related papers (2020-06-18T14:24:05Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.