Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference
- URL: http://arxiv.org/abs/2310.04378v1
- Date: Fri, 6 Oct 2023 17:11:58 GMT
- Title: Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference
- Authors: Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, Hang Zhao
- Abstract summary: We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
- Score: 60.32804641276217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent Diffusion models (LDMs) have achieved remarkable results in
synthesizing high-resolution images. However, the iterative sampling process is
computationally intensive and leads to slow generation. Inspired by Consistency
Models (song et al.), we propose Latent Consistency Models (LCMs), enabling
swift inference with minimal steps on any pre-trained LDMs, including Stable
Diffusion (rombach et al). Viewing the guided reverse diffusion process as
solving an augmented probability flow ODE (PF-ODE), LCMs are designed to
directly predict the solution of such ODE in latent space, mitigating the need
for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently
distilled from pre-trained classifier-free guided diffusion models, a
high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training.
Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method
that is tailored for fine-tuning LCMs on customized image datasets. Evaluation
on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve
state-of-the-art text-to-image generation performance with few-step inference.
Project Page: https://latent-consistency-models.github.io/
Related papers
- The Poisson Midpoint Method for Langevin Dynamics: Provably Efficient Discretization for Diffusion Models [9.392691963008385]
Langevin Monte Carlo (LMC) is the simplest and most studied algorithm.
We propose the Poisson Midpoint Method, which approximates a small step-size LMC with large step-sizes.
We show that it maintains the quality of DDPM with 1000 neural network calls with just 50-80 neural network calls and outperforms ODE based methods with similar compute.
arXiv Detail & Related papers (2024-05-27T11:40:42Z) - EM Distillation for One-step Diffusion Models [65.57766773137068]
We propose a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of quality.
We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process.
arXiv Detail & Related papers (2024-05-27T05:55:22Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Accelerating Image Generation with Sub-path Linear Approximation Model [31.86029397069562]
Diffusion models have advanced the state of the art in image, audio, and video generation tasks.
We propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining high-quality image generation.
arXiv Detail & Related papers (2024-04-22T06:25:17Z) - Accelerating Parallel Sampling of Diffusion Models [25.347710690711562]
We propose a novel approach that accelerates the sampling of diffusion models by parallelizing the autoregressive process.
Applying these techniques, we introduce ParaTAA, a universal and training-free parallel sampling algorithm.
Our experiments demonstrate that ParaTAA can decrease the inference steps required by common sequential sampling algorithms by a factor of 4$sim$14 times.
arXiv Detail & Related papers (2024-02-15T14:27:58Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Fast Inference in Denoising Diffusion Models via MMD Finetuning [23.779985842891705]
We present MMD-DDM, a novel method for fast sampling of diffusion models.
Our approach is based on the idea of using the Maximum Mean Discrepancy (MMD) to finetune the learned distribution with a given budget of timesteps.
Our findings show that the proposed method is able to produce high-quality samples in a fraction of the time required by widely-used diffusion models.
arXiv Detail & Related papers (2023-01-19T09:48:07Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.