Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis
- URL: http://arxiv.org/abs/2310.00224v1
- Date: Sat, 30 Sep 2023 02:03:22 GMT
- Title: Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis
- Authors: Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang,
Toshiaki Koike-Akino, Vishal M. Patel, Tim K. Marks
- Abstract summary: Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
- Score: 62.07413805483241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional generative models typically demand large annotated training sets
to achieve high-quality synthesis. As a result, there has been significant
interest in designing models that perform plug-and-play generation, i.e., to
use a predefined or pretrained model, which is not explicitly trained on the
generative task, to guide the generative process (e.g., using language).
However, such guidance is typically useful only towards synthesizing high-level
semantics rather than editing fine-grained details as in image-to-image
translation tasks. To this end, and capitalizing on the powerful fine-grained
generative control offered by the recent diffusion-based generative models, we
introduce Steered Diffusion, a generalized framework for photorealistic
zero-shot conditional image generation using a diffusion model trained for
unconditional generation. The key idea is to steer the image generation of the
diffusion model at inference time via designing a loss using a pre-trained
inverse model that characterizes the conditional task. This loss modulates the
sampling trajectory of the diffusion process. Our framework allows for easy
incorporation of multiple conditions during inference. We present experiments
using steered diffusion on several tasks including inpainting, colorization,
text-guided semantic editing, and image super-resolution. Our results
demonstrate clear qualitative and quantitative improvements over
state-of-the-art diffusion-based plug-and-play models while adding negligible
additional computational cost.
Related papers
- Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - DiffHarmony: Latent Diffusion Model Meets Image Harmonization [11.500358677234939]
Diffusion models have promoted the rapid development of image-to-image translation tasks.
Fine-tuning pre-trained latent diffusion models from scratch is computationally intensive.
In this paper, we adapt a pre-trained latent diffusion model to the image harmonization task to generate harmonious but potentially blurry initial images.
arXiv Detail & Related papers (2024-04-09T09:05:23Z) - JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement [69.6035373784027]
Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models.
Previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy.
We propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition.
arXiv Detail & Related papers (2023-12-20T08:05:57Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - Generating images of rare concepts using pre-trained diffusion models [32.5337654536764]
Text-to-image diffusion models can synthesize high-quality images, but they have various limitations.
We show that their limitation is partly due to the long-tail nature of their training data.
We show that rare concepts can be correctly generated by carefully selecting suitable generation seeds in the noise space.
arXiv Detail & Related papers (2023-04-27T20:55:38Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - High-Resolution Image Synthesis with Latent Diffusion Models [14.786952412297808]
Training diffusion models on autoencoders allows for the first time to reach a near-optimal point between complexity reduction and detail preservation.
Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks.
arXiv Detail & Related papers (2021-12-20T18:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.