Bounds all around: training energy-based models with bidirectional
bounds
- URL: http://arxiv.org/abs/2111.00929v2
- Date: Tue, 2 Nov 2021 12:37:53 GMT
- Title: Bounds all around: training energy-based models with bidirectional
bounds
- Authors: Cong Geng, Jia Wang, Zhiyong Gao, Jes Frellsen, S{\o}ren Hauberg
- Abstract summary: Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train.
Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function.
We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game.
- Score: 26.507268387712145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Energy-based models (EBMs) provide an elegant framework for density
estimation, but they are notoriously difficult to train. Recent work has
established links to generative adversarial networks, where the EBM is trained
through a minimax game with a variational value function. We propose a
bidirectional bound on the EBM log-likelihood, such that we maximize a lower
bound and minimize an upper bound when solving the minimax game. We link one
bound to a gradient penalty that stabilizes training, thereby providing
grounding for best engineering practice. To evaluate the bounds we develop a
new and efficient estimator of the Jacobi-determinant of the EBM generator. We
demonstrate that these developments significantly stabilize training and yield
high-quality density estimation and sample generation.
Related papers
- Improving Adversarial Energy-Based Model via Diffusion Process [25.023967485839155]
Adversarial EBMs introduce a generator to form a minimax training game.
Inspired by diffusion-based models, we embedded EBMs into each denoising step to split a long-generated process into several smaller steps.
Our experiments show significant improvement in generation compared to existing adversarial EBMs.
arXiv Detail & Related papers (2024-03-04T01:33:53Z) - Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - Generative Modeling through the Semi-dual Formulation of Unbalanced
Optimal Transport [9.980822222343921]
We propose a novel generative model based on the semi-dual formulation of Unbalanced Optimal Transport (UOT)
Unlike OT, UOT relaxes the hard constraint on distribution matching. This approach provides better robustness against outliers, stability during training, and faster convergence.
Our model outperforms existing OT-based generative models, achieving FID scores of 2.97 on CIFAR-10 and 6.36 on CelebA-HQ-256.
arXiv Detail & Related papers (2023-05-24T06:31:05Z) - Energy-guided Entropic Neural Optimal Transport [100.20553612296024]
Energy-based models (EBMs) are known in the Machine Learning community for decades.
We bridge the gap between EBMs and Entropy-regularized OT.
In practice, we validate its applicability in toy 2D and image domains.
arXiv Detail & Related papers (2023-04-12T18:20:58Z) - Guiding Energy-based Models via Contrastive Latent Variables [81.68492940158436]
An energy-based model (EBM) is a popular generative framework that offers both explicit density and architectural flexibility.
There often exists a large gap between EBMs and other generative frameworks like GANs in terms of generation quality.
We propose a novel and effective framework for improving EBMs via contrastive representation learning.
arXiv Detail & Related papers (2023-03-06T10:50:25Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - EBMs Trained with Maximum Likelihood are Generator Models Trained with a
Self-adverserial Loss [6.445605125467574]
We replace Langevin dynamics with deterministic solutions of the associated gradient descent ODE.
We show that reintroducing the noise in the dynamics does not lead to a qualitative change in the behavior.
We thus show that EBM training is effectively a self-adversarial procedure rather than maximum likelihood estimation.
arXiv Detail & Related papers (2021-02-23T15:34:12Z) - Imitation with Neural Density Models [98.34503611309256]
We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Imitation Occupancy Entropy Reinforcement Learning (RL) using the density as a reward.
Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator.
arXiv Detail & Related papers (2020-10-19T19:38:36Z) - No MCMC for me: Amortized sampling for fast and stable training of
energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling.
Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z) - Mode-Assisted Unsupervised Learning of Restricted Boltzmann Machines [7.960229223744695]
We show that properly combining standard gradient updates with an off-gradient direction improves their training dramatically over traditional gradient methods.
This approach, which we call mode training, promotes faster training and stability, in addition to lower converged relative entropy (KL divergence)
The mode training we suggest is quite versatile, as it can be applied in conjunction with any given gradient method, and is easily extended to more general energy-based neural network structures.
arXiv Detail & Related papers (2020-01-15T21:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.