Efficient Training of Energy-Based Models Using Jarzynski Equality
- URL: http://arxiv.org/abs/2305.19414v2
- Date: Mon, 11 Dec 2023 16:39:25 GMT
- Title: Efficient Training of Energy-Based Models Using Jarzynski Equality
- Authors: Davide Carbone, Mengjian Hua, Simon Coste, Eric Vanden-Eijnden
- Abstract summary: Energy-based models (EBMs) are generative models inspired by statistical physics.
The computation of its gradient with respect to the model parameters requires sampling the model distribution.
Here we show how results for nonequilibrium thermodynamics based on Jarzynski equality can be used to perform this computation efficiently.
- Score: 13.636994997309307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Energy-based models (EBMs) are generative models inspired by statistical
physics with a wide range of applications in unsupervised learning. Their
performance is best measured by the cross-entropy (CE) of the model
distribution relative to the data distribution. Using the CE as the objective
for training is however challenging because the computation of its gradient
with respect to the model parameters requires sampling the model distribution.
Here we show how results for nonequilibrium thermodynamics based on Jarzynski
equality together with tools from sequential Monte-Carlo sampling can be used
to perform this computation efficiently and avoid the uncontrolled
approximations made using the standard contrastive divergence algorithm.
Specifically, we introduce a modification of the unadjusted Langevin algorithm
(ULA) in which each walker acquires a weight that enables the estimation of the
gradient of the cross-entropy at any step during GD, thereby bypassing sampling
biases induced by slow mixing of ULA. We illustrate these results with
numerical experiments on Gaussian mixture distributions as well as the MNIST
dataset. We show that the proposed approach outperforms methods based on the
contrastive divergence algorithm in all the considered situations.
Related papers
- On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution.
In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Noise in the reverse process improves the approximation capabilities of
diffusion models [27.65800389807353]
In Score based Generative Modeling (SGMs), the state-of-the-art in generative modeling, reverse processes are known to perform better than their deterministic counterparts.
This paper delves into the heart of this phenomenon, comparing neural ordinary differential equations (ODEs) and neural dimension equations (SDEs) as reverse processes.
We analyze the ability of neural SDEs to approximate trajectories of the Fokker-Planck equation, revealing the advantages of neurality.
arXiv Detail & Related papers (2023-12-13T02:39:10Z) - Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - Balanced Training of Energy-Based Models with Adaptive Flow Sampling [13.951904929884618]
Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density.
We propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF)
Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times.
arXiv Detail & Related papers (2023-06-01T13:58:06Z) - A Geometric Perspective on Diffusion Models [60.69328526215776]
We inspect the ODE-based sampling of a popular variance-exploding SDE and reveal several intriguing structures of its sampling dynamics.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Estimation of sparse Gaussian graphical models with hidden clustering
structure [8.258451067861932]
We propose a model to estimate the sparse Gaussian graphical models with hidden clustering structure.
We develop a symmetric Gauss-Seidel based alternating direction method of the multipliers.
Numerical experiments on both synthetic data and real data demonstrate the good performance of our model.
arXiv Detail & Related papers (2020-04-17T08:43:31Z) - Expected Information Maximization: Using the I-Projection for Mixture
Density Estimation [22.096148237257644]
Modelling highly multi-modal data is a challenging problem in machine learning.
We present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection.
We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches.
arXiv Detail & Related papers (2020-01-23T17:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.