Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
- URL: http://arxiv.org/abs/2403.12015v1
- Date: Mon, 18 Mar 2024 17:51:43 GMT
- Title: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
- Authors: Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach,
- Abstract summary: Distillation methods aim to shift the model from many-shot to single-step inference.
We introduce Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach overcoming the limitations of ADD.
In contrast to pixel-based ADD, LADD utilizes generative features from pretrained latent diffusion models.
- Score: 24.236841051249243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator. We introduce Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach overcoming the limitations of ADD. In contrast to pixel-based ADD, LADD utilizes generative features from pretrained latent diffusion models. This approach simplifies training and enhances performance, enabling high-resolution multi-aspect ratio image synthesis. We apply LADD to Stable Diffusion 3 (8B) to obtain SD3-Turbo, a fast model that matches the performance of state-of-the-art text-to-image generators using only four unguided sampling steps. Moreover, we systematically investigate its scaling behavior and demonstrate LADD's effectiveness in various applications such as image editing and inpainting.
Related papers
- Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis [20.2271205957037]
Hyper-SD is a novel framework that amalgamates the advantages of ODE Trajectory Preservation and Reformulation.
We introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments.
We incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process.
arXiv Detail & Related papers (2024-04-21T15:16:05Z) - AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation [43.62480338471837]
Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs.
Their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps.
Inspired by the efficient adversarial diffusion distillation (ADD), we designnameto address this issue by incorporating the ideas of both distillation and ControlNet.
arXiv Detail & Related papers (2024-04-02T08:07:38Z) - Efficient Diffusion Model for Image Restoration by Residual Shifting [63.02725947015132]
This study proposes a novel and efficient diffusion model for image restoration.
Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration.
Our method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks.
arXiv Detail & Related papers (2024-03-12T05:06:07Z) - LoRA-Enhanced Distillation on Guided Diffusion Models [0.0]
This research explores a novel approach that combines Low-Rank Adaptation (LoRA) with model distillation to efficiently compress diffusion models.
Results are remarkable, featuring a significant reduction in inference time due to the distillation process and a substantial 50% reduction in memory consumption.
arXiv Detail & Related papers (2023-12-12T00:01:47Z) - Adversarial Diffusion Distillation [18.87099764514747]
Adversarial Diffusion Distillation (ADD) is a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps.
We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal.
Our model clearly outperforms existing few-step methods in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps.
arXiv Detail & Related papers (2023-11-28T18:53:24Z) - ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model [68.16230122583634]
ToddlerDiffusion is an interpretable 2D diffusion image-synthesis framework inspired by the human generation system.
Our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image.
arXiv Detail & Related papers (2023-11-24T15:20:01Z) - SinSR: Diffusion-Based Image Super-Resolution in a Single Step [119.18813219518042]
Super-resolution (SR) methods based on diffusion models exhibit promising results.
But their practical application is hindered by the substantial number of required inference steps.
We propose a simple yet effective method for achieving single-step SR generation, named SinSR.
arXiv Detail & Related papers (2023-11-23T16:21:29Z) - ResShift: Efficient Diffusion Model for Image Super-resolution by
Residual Shifting [70.83632337581034]
Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed.
We propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps.
Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual.
arXiv Detail & Related papers (2023-07-23T15:10:02Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.