Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
- URL: http://arxiv.org/abs/2503.20240v2
- Date: Sat, 29 Mar 2025 16:46:54 GMT
- Title: Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
- Authors: Prin Phunyaphibarn, Phillip Y. Lee, Jaihoon Kim, Minhyuk Sung,
- Abstract summary: We show that replacing unconditional noise in CFG with that predicted by the base model can significantly improve conditional generation.<n>We experimentally verify our claim with a range CFG-based conditional models for both image and video generation.
- Score: 10.542645300983878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classifier-Free Guidance (CFG) is a fundamental technique in training conditional diffusion models. The common practice for CFG-based training is to use a single network to learn both conditional and unconditional noise prediction, with a small dropout rate for conditioning. However, we observe that the joint learning of unconditional noise with limited bandwidth in training results in poor priors for the unconditional case. More importantly, these poor unconditional noise predictions become a serious reason for degrading the quality of conditional generation. Inspired by the fact that most CFG-based conditional models are trained by fine-tuning a base model with better unconditional generation, we first show that simply replacing the unconditional noise in CFG with that predicted by the base model can significantly improve conditional generation. Furthermore, we show that a diffusion model other than the one the fine-tuned model was trained on can be used for unconditional noise replacement. We experimentally verify our claim with a range of CFG-based conditional models for both image and video generation, including Zero-1-to-3, Versatile Diffusion, DiT, DynamiCrafter, and InstructPix2Pix.
Related papers
- Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models [25.301443993960277]
We revisit the core principles of CFG and introduce a new method, independent condition guidance (ICG)
ICG provides the benefits of CFG without the need for any special training procedures.
Our approach streamlines the training process of conditional diffusion models and can also be applied during inference on any pre-trained conditional model.
arXiv Detail & Related papers (2024-07-02T22:04:00Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Zero-Shot Conditioning of Score-Based Diffusion Models by Neuro-Symbolic Constraints [1.1826485120701153]
We propose a method that, given a pre-trained unconditional score-based generative model, samples from the conditional distribution under arbitrary logical constraints.<n>We show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint.<n>We define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints.
arXiv Detail & Related papers (2023-08-31T08:25:47Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - DuDGAN: Improving Class-Conditional GANs via Dual-Diffusion [2.458437232470188]
Class-conditional image generation using generative adversarial networks (GANs) has been investigated through various techniques.
We propose a novel approach for class-conditional image generation using GANs called DuDGAN, which incorporates a dual diffusion-based noise injection process.
Our method outperforms state-of-the-art conditional GAN models for image generation in terms of performance.
arXiv Detail & Related papers (2023-05-24T07:59:44Z) - Visual Chain-of-Thought Diffusion Models [15.547439887203613]
We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure.
Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25-50% compared to standard unconditional generation.
arXiv Detail & Related papers (2023-03-28T17:53:06Z) - Training and Inference on Any-Order Autoregressive Models the Right Way [97.39464776373902]
A family of Any-Order Autoregressive Models (AO-ARMs) has shown breakthrough performance in arbitrary conditional tasks.
We identify significant improvements to be made to previous formulations of AO-ARMs.
Our method leads to improved performance with no compromises on tractability.
arXiv Detail & Related papers (2022-05-26T18:00:02Z) - Collapse by Conditioning: Training Class-conditional GANs with Limited
Data [109.30895503994687]
We propose a training strategy for conditional GANs (cGANs) that effectively prevents the observed mode-collapse by leveraging unconditional learning.
Our training strategy starts with an unconditional GAN and gradually injects conditional information into the generator and the objective function.
The proposed method for training cGANs with limited data results not only in stable training but also in generating high-quality images.
arXiv Detail & Related papers (2022-01-17T18:59:23Z) - D2C: Diffusion-Denoising Models for Few-shot Conditional Generation [109.68228014811443]
This paper describes Diffusion-Decoding models with Contrastive representations (D2C)
D2C uses a learned diffusion-based prior over latent representations to improve generation and contrastive self-supervised learning to improve representation quality.
On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% - 60% of the human evaluators in a double-blind study.
arXiv Detail & Related papers (2021-06-12T16:32:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.