Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations
- URL: http://arxiv.org/abs/2306.01900v1
- Date: Fri, 2 Jun 2023 20:09:57 GMT
- Title: Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations
- Authors: Alexandros Graikos, Srikar Yellapragada, Dimitris Samaras
- Abstract summary: We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
- Score: 94.04631421741986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Denoising diffusion models have gained popularity as a generative modeling
technique for producing high-quality and diverse images. Applying these models
to downstream tasks requires conditioning, which can take the form of text,
class labels, or other forms of guidance. However, providing conditioning
information to these models can be challenging, particularly when annotations
are scarce or imprecise. In this paper, we propose adapting pre-trained
unconditional diffusion models to new conditions using the learned internal
representations of the denoiser network. We demonstrate the effectiveness of
our approach on various conditional generation tasks, including
attribute-conditioned generation and mask-conditioned generation. Additionally,
we show that augmenting the Tiny ImageNet training set with synthetic images
generated by our approach improves the classification accuracy of ResNet
baselines by up to 8%. Our approach provides a powerful and flexible way to
adapt diffusion models to new conditions and generate high-quality augmented
data for various conditional generation tasks.
Related papers
- Boosting Generative Image Modeling via Joint Image-Feature Synthesis [10.32324138962724]
We introduce a novel generative image modeling framework that seamlessly bridges the gap by leveraging a diffusion model to jointly model low-level image latents.
Our latent-semantic diffusion approach learns to generate coherent image-feature pairs from pure noise.
By eliminating the need for complex distillation objectives, our unified design simplifies training and unlocks a powerful new inference strategy: Representation Guidance.
arXiv Detail & Related papers (2025-04-22T17:41:42Z) - Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression [7.859083902013309]
Diffusion models generate high-quality images through progressive denoising but are computationally intensive due to large model sizes and repeated sampling.
We propose Random Conditioning, a novel approach that pairs noised images with randomly selected text conditions to enable efficient, image-free knowledge distillation.
Our method allows the student to explore the condition space without generating condition-specific images, resulting in notable improvements in both generation quality and efficiency.
arXiv Detail & Related papers (2025-04-02T05:41:19Z) - D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens [80.75893450536577]
We propose D2C, a novel two-stage method to enhance model generation capacity.
In the first stage, the discrete-valued tokens representing coarse-grained image features are sampled by employing a small discrete-valued generator.
In the second stage, the continuous-valued tokens representing fine-grained image features are learned conditioned on the discrete token sequence.
arXiv Detail & Related papers (2025-03-21T13:58:49Z) - Understanding the Quality-Diversity Trade-off in Diffusion Language Models [0.0]
Diffusion models can be used to model continuous data across a range of domains such as vision and audio.
Recent work explores their application to text generation by working in the continuous embedding space.
Models lack a natural means to control the inherent trade-off between quality and diversity.
arXiv Detail & Related papers (2025-03-11T17:18:01Z) - Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data.
This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns.
We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models [34.611309081801345]
Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation.
In this paper, we propose a novel strategy to scale a generative model across new tasks with minimal compute.
arXiv Detail & Related papers (2024-04-15T17:55:56Z) - Conditional Image Generation with Pretrained Generative Model [1.4685355149711303]
diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models.
These models require a huge amount of data, computational resources, and meticulous tuning for successful training.
We propose methods to leverage pre-trained unconditional diffusion models with additional guidance for the purpose of conditional image generative.
arXiv Detail & Related papers (2023-12-20T18:27:53Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.