Self-Guided Diffusion Models
- URL: http://arxiv.org/abs/2210.06462v3
- Date: Mon, 27 Nov 2023 18:30:14 GMT
- Title: Self-Guided Diffusion Models
- Authors: Vincent Tao Hu, David W Zhang, Yuki M. Asano, Gertjan J. Burghouts,
Cees G. M. Snoek
- Abstract summary: We propose a framework for self-guided diffusion models.
Our method provides guidance signals at various image granularities.
Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance.
- Score: 53.825634944114285
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Diffusion models have demonstrated remarkable progress in image generation
quality, especially when guidance is used to control the generative process.
However, guidance requires a large amount of image-annotation pairs for
training and is thus dependent on their availability, correctness and
unbiasedness. In this paper, we eliminate the need for such annotation by
instead leveraging the flexibility of self-supervision signals to design a
framework for self-guided diffusion models. By leveraging a feature extraction
function and a self-annotation function, our method provides guidance signals
at various image granularities: from the level of holistic images to object
boxes and even segmentation masks. Our experiments on single-label and
multi-label image datasets demonstrate that self-labeled guidance always
outperforms diffusion models without guidance and may even surpass guidance
based on ground-truth labels, especially on unbalanced data. When equipped with
self-supervised box or mask proposals, our method further generates visually
diverse yet semantically consistent images, without the need for any class,
box, or segment label annotation. Self-guided diffusion is simple, flexible and
expected to profit from deployment at scale. Source code will be at:
https://taohu.me/sgdm/
Related papers
- Plug-and-Play Diffusion Distillation [14.359953671470242]
We propose a new distillation approach for guided diffusion models.
An external lightweight guide model is trained while the original text-to-image model remains frozen.
We show that our method reduces the inference of classifier-free guided latent-space diffusion models by almost half.
arXiv Detail & Related papers (2024-06-04T04:22:47Z) - RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance [40.69996772681004]
We exploit a training-free technique that steers diffusion models using an existing classifier, for personalized image generation.
Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance can be resolved with a simple fixed-point solution.
The derived method is implemented on rectified flow with different off-the-shelf image discriminators, delivering advantageous personalization results for human faces, live subjects, and certain objects.
arXiv Detail & Related papers (2024-05-23T15:12:15Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Semantic Guidance Tuning for Text-To-Image Diffusion Models [3.3881449308956726]
We propose a training-free approach that modulates the guidance direction of diffusion models during inference.
We first decompose the prompt semantics into a set of concepts, and monitor the guidance trajectory in relation to each concept.
Based on this observation, we devise a technique to steer the guidance direction towards any concept from which the model diverges.
arXiv Detail & Related papers (2023-12-26T09:02:17Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Readout Guidance: Learning Control from Diffusion Features [96.22155562120231]
We present Readout Guidance, a method for controlling text-to-image diffusion models with learned signals.
Readout Guidance uses readout heads, lightweight networks trained to extract signals from the features of a pre-trained, frozen diffusion model at every timestep.
These readouts can encode single-image properties, such as pose, depth, and edges; or higher-order properties that relate multiple images, such as correspondence and appearance similarity.
arXiv Detail & Related papers (2023-12-04T18:59:32Z) - DiffuGen: Adaptable Approach for Generating Labeled Image Datasets using
Stable Diffusion Models [2.0935496890864207]
"DiffuGen" is a simple and adaptable approach that harnesses the power of stable diffusion models to create labeled image datasets efficiently.
By leveraging stable diffusion models, our approach not only ensures the quality of generated datasets but also provides a versatile solution for label generation.
arXiv Detail & Related papers (2023-09-01T04:42:03Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Towards Practical Plug-and-Play Diffusion Models [19.846094740800254]
Diffusion-based generative models have achieved remarkable success in image generation.
Direct use of publicly available off-the-shelf models for guidance fails due to poor performance on noisy inputs.
Existing practice is to fine-tune the guidance models with labeled data corrupted with noises.
arXiv Detail & Related papers (2022-12-12T15:29:46Z) - Diverse Image Generation via Self-Conditioned GANs [56.91974064348137]
We train a class-conditional GAN model without using manually annotated class labels.
Instead, our model is conditional on labels automatically derived from clustering in the discriminator's feature space.
Our clustering step automatically discovers diverse modes, and explicitly requires the generator to cover them.
arXiv Detail & Related papers (2020-06-18T17:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.