Related papers: Gradient estimators for normalising flows

Gradient estimators for normalising flows

URL: http://arxiv.org/abs/2202.01314v1
Date: Wed, 2 Feb 2022 22:37:58 GMT
Title: Gradient estimators for normalising flows
Authors: Piotr Bialas and Piotr Korcyl and Tomasz Stebel
Abstract summary: A machine learning approach to Monte-Carlo simulations called Neural Markov Chain Monte-Carlo is gaining traction. We present another gradient estimator that avoids the calculation, thus potentially speeding up training for models with more complicated actions. We also study the statistical properties of several gradient estimators and show that our formulation leads to better training results.
Score: 0.05156484100374058
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recently a machine learning approach to Monte-Carlo simulations called Neural Markov Chain Monte-Carlo (NMCMC) is gaining traction. In its most popular form it uses the neural networks to construct normalizing flows which are then trained to approximate the desired target distribution. As this distribution is usually defined via a Hamiltonian or action, the standard learning algorithm requires estimation of the action gradient with respect to the fields. In this contribution we present another gradient estimator (and the corresponding [PyTorch implementation) that avoids this calculation, thus potentially speeding up training for models with more complicated actions. We also study the statistical properties of several gradient estimators and show that our formulation leads to better training results.

Related papers

Optimization Insights into Deep Diagonal Linear Networks [10.395029724463672]
We study the implicit regularization properties of the gradient flow "algorithm" for estimating the parameters of a deep diagonal neural network. Our main contribution is showing that this gradient flow induces a mirror flow dynamic on the model, meaning that it is biased towards a specific solution of the problem.
arXiv Detail & Related papers (2024-12-21T20:23:47Z)
Training normalizing flows with computationally intensive target probability distributions [0.018416014644193065]
We propose an estimator for normalizing flows based on the REINFORCE algorithm. It is up to ten times faster in terms of the wall-clock time and requires up to $30%$ less memory.
arXiv Detail & Related papers (2023-08-25T10:40:46Z)
Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z)
Classified as unknown: A novel Bayesian neural network [0.0]
We develop a new efficient Bayesian learning algorithm for fully connected neural networks. We generalize the algorithm for a single perceptron for binary classification in citeH to multi-layer perceptrons for multi-class classification.
arXiv Detail & Related papers (2023-01-31T04:27:09Z)
Learning Optimal Flows for Non-Equilibrium Importance Sampling [13.469239537683299]
We develop a method to perform calculations based on generating samples from a simple base distribution, transporting them along the flow generated by a velocity field, and performing averages along these flowlines. On the theory side we discuss how to tailor the velocity field to the target and establish general conditions under which the proposed estimator is a perfect estimator. On the computational side we show how to use deep learning to represent the velocity field by a neural network and train it towards the zero variance optimum.
arXiv Detail & Related papers (2022-06-20T17:25:26Z)
Training Discrete Deep Generative Models via Gapped Straight-Through Estimator [72.71398034617607]
We propose a Gapped Straight-Through ( GST) estimator to reduce the variance without incurring resampling overhead. This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax. Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks.
arXiv Detail & Related papers (2022-06-15T01:46:05Z)
Stochastic normalizing flows as non-equilibrium transformations [62.997667081978825]
We show that normalizing flows provide a route to sample lattice field theories more efficiently than conventional MonteCarlo simulations. We lay out a strategy to optimize the efficiency of this extended class of generative models and present examples of applications.
arXiv Detail & Related papers (2022-01-21T19:00:18Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated. We propose a new method for this estimation problem combining sampling and analytic approximation steps. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.