A Simple Approach to Unifying Diffusion-based Conditional Generation
- URL: http://arxiv.org/abs/2410.11439v1
- Date: Tue, 15 Oct 2024 09:41:43 GMT
- Title: A Simple Approach to Unifying Diffusion-based Conditional Generation
- Authors: Xirui Li, Charles Herrmann, Kelvin C. K. Chan, Yinxiao Li, Deqing Sun, Chao Ma, Ming-Hsuan Yang,
- Abstract summary: We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
- Score: 63.389616350290595
- License:
- Abstract: Recent progress in image generation has sparked research into controlling these models through condition signals, with various methods addressing specific challenges in conditional generation. Instead of proposing another specialized technique, we introduce a simple, unified framework to handle diverse conditional generation tasks involving a specific image-condition correlation. By learning a joint distribution over a correlated image pair (e.g. image and depth) with a diffusion model, our approach enables versatile capabilities via different inference-time sampling schemes, including controllable image generation (e.g. depth to image), estimation (e.g. image to depth), signal guidance, joint generation (image & depth), and coarse control. Previous attempts at unification often introduce significant complexity through multi-stage training, architectural modification, or increased parameter counts. In contrast, our simple formulation requires a single, computationally efficient training stage, maintains the standard model input, and adds minimal learned parameters (15% of the base model). Moreover, our model supports additional capabilities like non-spatially aligned and coarse conditioning. Extensive results show that our single model can produce comparable results with specialized methods and better results than prior unified methods. We also demonstrate that multiple models can be effectively combined for multi-signal conditional generation.
Related papers
- Attack Deterministic Conditional Image Generative Models for Diverse and
Controllable Generation [17.035117118768945]
We propose a plug-in projected gradient descent (PGD) like method for diverse and controllable image generation.
The key idea is attacking the pre-trained deterministic generative models by adding a micro perturbation to the input condition.
Our work opens the door to applying adversarial attack to low-level vision tasks.
arXiv Detail & Related papers (2024-03-13T06:57:23Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis [7.726465518306907]
A persistent challenge has been to generate diverse versions of output images from the same input image.
We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks.
It is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.
arXiv Detail & Related papers (2021-06-16T17:58:13Z) - Diverse Semantic Image Synthesis via Probability Distribution Modeling [103.88931623488088]
We propose a novel diverse semantic image synthesis framework.
Our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-03-11T18:59:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.