Towards Practical Plug-and-Play Diffusion Models
- URL: http://arxiv.org/abs/2212.05973v2
- Date: Mon, 27 Mar 2023 13:17:48 GMT
- Title: Towards Practical Plug-and-Play Diffusion Models
- Authors: Hyojun Go, Yunsung Lee, Jin-Young Kim, Seunghyun Lee, Myeongho Jeong,
Hyun Seung Lee, and Seungtaek Choi
- Abstract summary: Diffusion-based generative models have achieved remarkable success in image generation.
Direct use of publicly available off-the-shelf models for guidance fails due to poor performance on noisy inputs.
Existing practice is to fine-tune the guidance models with labeled data corrupted with noises.
- Score: 19.846094740800254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion-based generative models have achieved remarkable success in image
generation. Their guidance formulation allows an external model to
plug-and-play control the generation process for various tasks without
finetuning the diffusion model. However, the direct use of publicly available
off-the-shelf models for guidance fails due to their poor performance on noisy
inputs. For that, the existing practice is to fine-tune the guidance models
with labeled data corrupted with noises. In this paper, we argue that this
practice has limitations in two aspects: (1) performing on inputs with
extremely various noises is too hard for a single guidance model; (2)
collecting labeled datasets hinders scaling up for various tasks. To tackle the
limitations, we propose a novel strategy that leverages multiple experts where
each expert is specialized in a particular noise range and guides the reverse
process of the diffusion at its corresponding timesteps. However, as it is
infeasible to manage multiple networks and utilize labeled data, we present a
practical guidance framework termed Practical Plug-And-Play (PPAP), which
leverages parameter-efficient fine-tuning and data-free knowledge transfer. We
exhaustively conduct ImageNet class conditional generation experiments to show
that our method can successfully guide diffusion with small trainable
parameters and no labeled data. Finally, we show that image classifiers, depth
estimators, and semantic segmentation models can guide publicly available GLIDE
through our framework in a plug-and-play manner. Our code is available at
https://github.com/riiid/PPAP.
Related papers
- Plug-and-Play Diffusion Distillation [14.359953671470242]
We propose a new distillation approach for guided diffusion models.
An external lightweight guide model is trained while the original text-to-image model remains frozen.
We show that our method reduces the inference of classifier-free guided latent-space diffusion models by almost half.
arXiv Detail & Related papers (2024-06-04T04:22:47Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data [74.2507346810066]
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data.
We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data.
arXiv Detail & Related papers (2024-03-20T14:22:12Z) - One-Step Image Translation with Text-to-Image Models [35.0987002313882]
We introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives.
We consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights.
Our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks.
arXiv Detail & Related papers (2024-03-18T17:59:40Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons"
We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets.
Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z) - Self-Guided Diffusion Models [53.825634944114285]
We propose a framework for self-guided diffusion models.
Our method provides guidance signals at various image granularities.
Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance.
arXiv Detail & Related papers (2022-10-12T17:57:58Z) - Label-Efficient Semantic Segmentation with Diffusion Models [27.01899943738203]
We demonstrate that diffusion models can also serve as an instrument for semantic segmentation.
In particular, for several pretrained diffusion models, we investigate the intermediate activations from the networks that perform the Markov step of the reverse diffusion process.
We show that these activations effectively capture the semantic information from an input image and appear to be excellent pixel-level representations for the segmentation problem.
arXiv Detail & Related papers (2021-12-06T15:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.