Universal Guidance for Diffusion Models
- URL: http://arxiv.org/abs/2302.07121v1
- Date: Tue, 14 Feb 2023 15:30:44 GMT
- Title: Universal Guidance for Diffusion Models
- Authors: Arpit Bansal, Hong-Min Chu, Avi Schwarzschild, Soumyadip Sengupta,
Micah Goldblum, Jonas Geiping, Tom Goldstein
- Abstract summary: We propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components.
We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals.
- Score: 54.99356512898613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Typical diffusion models are trained to accept a particular form of
conditioning, most commonly text, and cannot be conditioned on other modalities
without retraining. In this work, we propose a universal guidance algorithm
that enables diffusion models to be controlled by arbitrary guidance modalities
without the need to retrain any use-specific components. We show that our
algorithm successfully generates quality images with guidance functions
including segmentation, face recognition, object detection, and classifier
signals. Code is available at
https://github.com/arpitbansal297/Universal-Guided-Diffusion.
Related papers
- Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z) - Variational Control for Guidance in Diffusion Models [19.51536406897083]
We introduce Diffusion Trajectory Matching (DTM) that enables guiding pretrained diffusion trajectories to satisfy a terminal cost.<n>DTM unifies a broad class of guidance methods and enables novel instantiations.<n>We introduce a new method within this framework that achieves state-of-the-art results on several linear, non-linear, and blind inverse problems.
arXiv Detail & Related papers (2025-02-06T00:24:39Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance [40.69996772681004]
We exploit a training-free technique that steers diffusion models using an existing classifier, for personalized image generation.
Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance can be resolved with a simple fixed-point solution.
The derived method is implemented on rectified flow with different off-the-shelf image discriminators, delivering advantageous personalization results for human faces, live subjects, and certain objects.
arXiv Detail & Related papers (2024-05-23T15:12:15Z) - Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding [90.77521413857448]
Deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations.
We introduce Generalized generative adversarial-Decoding Diffusion Probabilistic Models (EDDPMs)
EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding.
Experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks.
arXiv Detail & Related papers (2024-02-29T10:08:57Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Denoising Diffusion Autoencoders are Unified Self-supervised Learners [58.194184241363175]
This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners.
DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders.
Our diffusion-based approach achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and Tiny-ImageNet.
arXiv Detail & Related papers (2023-03-17T04:20:47Z) - Towards Practical Plug-and-Play Diffusion Models [19.846094740800254]
Diffusion-based generative models have achieved remarkable success in image generation.
Direct use of publicly available off-the-shelf models for guidance fails due to poor performance on noisy inputs.
Existing practice is to fine-tune the guidance models with labeled data corrupted with noises.
arXiv Detail & Related papers (2022-12-12T15:29:46Z) - Self-Guided Diffusion Models [53.825634944114285]
We propose a framework for self-guided diffusion models.
Our method provides guidance signals at various image granularities.
Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance.
arXiv Detail & Related papers (2022-10-12T17:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.