Variational Diffusion Models
- URL: http://arxiv.org/abs/2107.00630v6
- Date: Fri, 14 Apr 2023 00:20:30 GMT
- Title: Variational Diffusion Models
- Authors: Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho
- Abstract summary: We introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on image density estimation benchmarks.
We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data.
- Score: 33.0719137062396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion-based generative models have demonstrated a capacity for
perceptually impressive synthesis, but can they also be great likelihood-based
models? We answer this in the affirmative, and introduce a family of
diffusion-based generative models that obtain state-of-the-art likelihoods on
standard image density estimation benchmarks. Unlike other diffusion-based
models, our method allows for efficient optimization of the noise schedule
jointly with the rest of the model. We show that the variational lower bound
(VLB) simplifies to a remarkably short expression in terms of the
signal-to-noise ratio of the diffused data, thereby improving our theoretical
understanding of this model class. Using this insight, we prove an equivalence
between several models proposed in the literature. In addition, we show that
the continuous-time VLB is invariant to the noise schedule, except for the
signal-to-noise ratio at its endpoints. This enables us to learn a noise
schedule that minimizes the variance of the resulting VLB estimator, leading to
faster optimization. Combining these advances with architectural improvements,
we obtain state-of-the-art likelihoods on image density estimation benchmarks,
outperforming autoregressive models that have dominated these benchmarks for
many years, with often significantly faster optimization. In addition, we show
how to use the model as part of a bits-back compression scheme, and demonstrate
lossless compression rates close to the theoretical optimum. Code is available
at https://github.com/google-research/vdm .
Related papers
- Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization [97.35427957922714]
We present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model.
PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images.
We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data.
arXiv Detail & Related papers (2024-10-04T07:05:16Z) - Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think [53.2706196341054]
We show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed.
We perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models.
arXiv Detail & Related papers (2024-09-17T16:58:52Z) - The Missing U for Efficient Diffusion Models [3.712196074875643]
Diffusion Probabilistic Models yield record-breaking performance in tasks such as image synthesis, video generation, and molecule design.
Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs.
We introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models.
arXiv Detail & Related papers (2023-10-31T00:12:14Z) - Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion
Models [76.46246743508651]
We show that current diffusion models actually have an expressive bottleneck in backward denoising.
We introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising.
arXiv Detail & Related papers (2023-09-25T12:03:32Z) - Fast Diffusion EM: a diffusion model for blind inverse problems with
application to deconvolution [0.0]
Current methods assume the degradation to be known and provide impressive results in terms of restoration and diversity.
In this work, we leverage the efficiency of those models to jointly estimate the restored image and unknown parameters of the kernel model.
Our method alternates between approximating the expected log-likelihood of the problem using samples drawn from a diffusion model and a step to estimate unknown model parameters.
arXiv Detail & Related papers (2023-09-01T06:47:13Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from
Low-Dimensional Latents [26.17940552906923]
We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework.
We show that the proposed model can generate high-resolution samples and exhibits quality comparable to state-of-the-art models on standard benchmarks.
arXiv Detail & Related papers (2022-01-02T06:44:23Z) - Denoising Diffusion Probabilistic Models [91.94962645056896]
We present high quality image synthesis results using diffusion probabilistic models.
Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics.
arXiv Detail & Related papers (2020-06-19T17:24:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.