Gradient estimators for normalising flows
- URL: http://arxiv.org/abs/2202.01314v1
- Date: Wed, 2 Feb 2022 22:37:58 GMT
- Title: Gradient estimators for normalising flows
- Authors: Piotr Bialas and Piotr Korcyl and Tomasz Stebel
- Abstract summary: A machine learning approach to Monte-Carlo simulations called Neural Markov Chain Monte-Carlo is gaining traction.
We present another gradient estimator that avoids the calculation, thus potentially speeding up training for models with more complicated actions.
We also study the statistical properties of several gradient estimators and show that our formulation leads to better training results.
- Score: 0.05156484100374058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently a machine learning approach to Monte-Carlo simulations called Neural
Markov Chain Monte-Carlo (NMCMC) is gaining traction. In its most popular form
it uses the neural networks to construct normalizing flows which are then
trained to approximate the desired target distribution. As this distribution is
usually defined via a Hamiltonian or action, the standard learning algorithm
requires estimation of the action gradient with respect to the fields. In this
contribution we present another gradient estimator (and the corresponding
[PyTorch implementation) that avoids this calculation, thus potentially
speeding up training for models with more complicated actions. We also study
the statistical properties of several gradient estimators and show that our
formulation leads to better training results.
Related papers
- Training normalizing flows with computationally intensive target
probability distributions [0.018416014644193065]
We propose an estimator for normalizing flows based on the REINFORCE algorithm.
It is up to ten times faster in terms of the wall-clock time and requires up to $30%$ less memory.
arXiv Detail & Related papers (2023-08-25T10:40:46Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Classified as unknown: A novel Bayesian neural network [0.0]
We develop a new efficient Bayesian learning algorithm for fully connected neural networks.
We generalize the algorithm for a single perceptron for binary classification in citeH to multi-layer perceptrons for multi-class classification.
arXiv Detail & Related papers (2023-01-31T04:27:09Z) - Learning Optimal Flows for Non-Equilibrium Importance Sampling [13.469239537683299]
We develop a method to perform calculations based on generating samples from a simple base distribution, transporting them along the flow generated by a velocity field, and performing averages along these flowlines.
On the theory side we discuss how to tailor the velocity field to the target and establish general conditions under which the proposed estimator is a perfect estimator.
On the computational side we show how to use deep learning to represent the velocity field by a neural network and train it towards the zero variance optimum.
arXiv Detail & Related papers (2022-06-20T17:25:26Z) - Training Discrete Deep Generative Models via Gapped Straight-Through
Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead.
This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax.
Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z) - Stochastic normalizing flows as non-equilibrium transformations [62.997667081978825]
We show that normalizing flows provide a route to sample lattice field theories more efficiently than conventional MonteCarlo simulations.
We lay out a strategy to optimize the efficiency of this extended class of generative models and present examples of applications.
arXiv Detail & Related papers (2022-01-21T19:00:18Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.