CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation
- URL: http://arxiv.org/abs/2310.01407v2
- Date: Sat, 17 Feb 2024 14:17:36 GMT
- Title: CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation
- Authors: Kangfu Mei and Mauricio Delbracio and Hossein Talebi and Zhengzhong Tu
and Vishal M. Patel and Peyman Milanfar
- Abstract summary: Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
- Score: 49.3016007471979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large generative diffusion models have revolutionized text-to-image
generation and offer immense potential for conditional generation tasks such as
image enhancement, restoration, editing, and compositing. However, their
widespread adoption is hindered by the high computational cost, which limits
their real-time application. To address this challenge, we introduce a novel
method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept
additional image conditioning inputs while significantly reducing the sampling
steps required to achieve high-quality results. Our method can leverage
architectures such as ControlNet to incorporate conditioning inputs without
compromising the model's prior knowledge gained during large scale
pre-training. Additionally, a conditional consistency loss enforces consistent
predictions across diffusion steps, effectively compelling the model to
generate high-quality images with conditions in a few steps. Our
conditional-task learning and distillation approach outperforms previous
distillation methods, achieving a new state-of-the-art in producing
high-quality images with very few steps (e.g., 1-4) across multiple tasks,
including super-resolution, text-guided image editing, and depth-to-image
generation.
Related papers
- Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance [11.44012694656102]
Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains.
Existing large-scale diffusion models are confined to generating images of up to 1K resolution.
We propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images.
arXiv Detail & Related papers (2024-06-26T16:10:31Z) - DiffHarmony: Latent Diffusion Model Meets Image Harmonization [11.500358677234939]
Diffusion models have promoted the rapid development of image-to-image translation tasks.
Fine-tuning pre-trained latent diffusion models from scratch is computationally intensive.
In this paper, we adapt a pre-trained latent diffusion model to the image harmonization task to generate harmonious but potentially blurry initial images.
arXiv Detail & Related papers (2024-04-09T09:05:23Z) - TCIG: Two-Stage Controlled Image Generation with Quality Enhancement
through Diffusion [0.0]
A two-stage method that combines controllability and high quality in the generation of images is proposed.
By separating controllability from high quality, This method achieves outstanding results.
arXiv Detail & Related papers (2024-03-02T13:59:02Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.