Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood
- URL: http://arxiv.org/abs/2309.05153v5
- Date: Sun, 10 Nov 2024 06:06:52 GMT
- Title: Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood
- Authors: Yaxuan Zhu, Jianwen Xie, Yingnian Wu, Ruiqi Gao,
- Abstract summary: Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
- Score: 64.95663299945171
- License:
- Abstract: Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM. At each noise level, the two models are jointly estimated within a cooperative training framework: samples from the initializer serve as starting points that are refined by a few MCMC sampling steps from the EBM. The EBM is then optimized by maximizing recovery likelihood, while the initializer model is optimized by learning from the difference between the refined samples and the initial samples. In addition, we made several practical designs for EBM training to further improve the sample quality. Combining these advances, our approach significantly boost the generation performance compared to existing EBM methods on CIFAR-10 and ImageNet datasets. We also demonstrate the effectiveness of our models for several downstream tasks, including classifier-free guided generation, compositional generation, image inpainting and out-of-distribution detection.
Related papers
- Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - EM Distillation for One-step Diffusion Models [65.57766773137068]
We propose a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of quality.
We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process.
arXiv Detail & Related papers (2024-05-27T05:55:22Z) - Learning Latent Space Hierarchical EBM Diffusion Models [4.4996462447311725]
We study the learning problem of the energy-based prior model and the multi-layer generator model.
Recent works have explored learning the energy-based (EBM) prior model as a second-stage, complementary model to bridge the gap.
We propose to leverage the diffusion probabilistic scheme to mitigate the burden of EBM sampling and thus facilitate EBM learning.
arXiv Detail & Related papers (2024-05-22T18:34:25Z) - Learning Energy-Based Prior Model with Diffusion-Amortized MCMC [89.95629196907082]
Common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress.
We introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
arXiv Detail & Related papers (2023-10-05T00:23:34Z) - Balanced Training of Energy-Based Models with Adaptive Flow Sampling [13.951904929884618]
Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density.
We propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF)
Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times.
arXiv Detail & Related papers (2023-06-01T13:58:06Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Persistently Trained, Diffusion-assisted Energy-based Models [18.135784288023928]
We introduce diffusion data and learn a joint EBM, called diffusion assisted-EBMs, through persistent training.
We show that persistently trained EBMs can simultaneously achieve long-run stability, post-training image generation, and superior out-of-distribution detection.
arXiv Detail & Related papers (2023-04-21T02:29:18Z) - Learning Energy-Based Model with Variational Auto-Encoder as Amortized
Sampler [35.80109055748496]
Training energy-based models (EBMs) by maximum likelihood requires Markov chain Monte Carlo sampling.
We learn a variational auto-encoder (VAE) to initialize the finite-step MCMC, such as Langevin dynamics that is derived from the energy function.
With these amortized MCMC samples, the EBM can be trained by maximum likelihood, which follows an "analysis by synthesis" scheme.
We call this joint training algorithm the variational MCMC teaching, in which the VAE chases the EBM toward data distribution.
arXiv Detail & Related papers (2020-12-29T20:46:40Z) - Learning Energy-Based Models by Diffusion Recovery Likelihood [61.069760183331745]
We present a diffusion recovery likelihood method to tractably learn and sample from a sequence of energy-based models.
After training, synthesized images can be generated by the sampling process that initializes from Gaussian white noise distribution.
On unconditional CIFAR-10 our method achieves FID 9.58 and inception score 8.30, superior to the majority of GANs.
arXiv Detail & Related papers (2020-12-15T07:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.