Mitigating Out-of-Distribution Data Density Overestimation in
Energy-Based Models
- URL: http://arxiv.org/abs/2205.14817v1
- Date: Mon, 30 May 2022 02:49:17 GMT
- Title: Mitigating Out-of-Distribution Data Density Overestimation in
Energy-Based Models
- Authors: Beomsu Kim, Jong Chul Ye
- Abstract summary: Deep energy-based models (EBMs) are receiving increasing attention due to their ability to learn complex distributions.
To train deep EBMs, the maximum likelihood estimation (MLE) with short-run Langevin Monte Carlo (LMC) is often used.
We investigate why the MLE with short-run LMC can converge to EBMs with wrong density estimates.
- Score: 54.06799491319278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep energy-based models (EBMs), which use deep neural networks (DNNs) as
energy functions, are receiving increasing attention due to their ability to
learn complex distributions. To train deep EBMs, the maximum likelihood
estimation (MLE) with short-run Langevin Monte Carlo (LMC) is often used. While
the MLE with short-run LMC is computationally efficient compared to an MLE with
full Markov Chain Monte Carlo (MCMC), it often assigns high density to
out-of-distribution (OOD) data. To address this issue, here we systematically
investigate why the MLE with short-run LMC can converge to EBMs with wrong
density estimates, and reveal that the heuristic modifications to LMC
introduced by previous works were the main problem. We then propose a Uniform
Support Partitioning (USP) scheme that optimizes a set of points to evenly
partition the support of the EBM and then uses the resulting points to
approximate the EBM-MLE loss gradient. We empirically demonstrate that USP
avoids the pitfalls of short-run LMC, leading to significantly improved OOD
data detection performance on Fashion-MNIST.
Related papers
- Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement
Learning in Discounted Linear MDPs [16.006893624836554]
We propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE)
VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step.
In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
arXiv Detail & Related papers (2023-10-17T18:27:27Z) - Learning Energy-Based Prior Model with Diffusion-Amortized MCMC [89.95629196907082]
Common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress.
We introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
arXiv Detail & Related papers (2023-10-05T00:23:34Z) - Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - How to Train Your Energy-Based Models [19.65375049263317]
Energy-Based Models (EBMs) specify probability density or mass functions up to an unknown normalizing constant.
This tutorial is targeted at an audience with basic understanding of generative models who want to apply EBMs or start a research project in this direction.
arXiv Detail & Related papers (2021-01-09T04:51:31Z) - Learning Energy-Based Model with Variational Auto-Encoder as Amortized
Sampler [35.80109055748496]
Training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo sampling.
We learn a variational auto-encoder (VAE) to initialize the finite-step MCMC, such as Langevin dynamics that is derived from the energy function.
With these amortized MCMC samples, the EBM can be trained by maximum likelihood, which follows an "analysis by synthesis" scheme.
We call this joint training algorithm the variational MCMC teaching, in which the VAE chases the EBM toward data distribution.
arXiv Detail & Related papers (2020-12-29T20:46:40Z) - No MCMC for me: Amortized sampling for fast and stable training of
energy-based models [62.1234885852552]
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
We present a simple method for training EBMs at scale using an entropy-regularized generator to amortize the MCMC sampling.
Next, we apply our estimator to the recently proposed Joint Energy Model (JEM), where we match the original performance with faster and stable training.
arXiv Detail & Related papers (2020-10-08T19:17:20Z) - Learning Latent Space Energy-Based Prior Model [118.86447805707094]
We learn energy-based model (EBM) in the latent space of a generator model.
We show that the learned model exhibits strong performances in terms of image and text generation and anomaly detection.
arXiv Detail & Related papers (2020-06-15T08:11:58Z) - MCMC Should Mix: Learning Energy-Based Model with Neural Transport
Latent Space MCMC [110.02001052791353]
Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm.
We show that the model has a particularly simple form in the space of the latent variables of the backbone model.
arXiv Detail & Related papers (2020-06-12T01:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.