Understanding and Improving Training-free Loss-based Diffusion Guidance
- URL: http://arxiv.org/abs/2403.12404v2
- Date: Wed, 29 May 2024 09:59:36 GMT
- Title: Understanding and Improving Training-free Loss-based Diffusion Guidance
- Authors: Yifei Shen, Xinyang Jiang, Yezhen Wang, Yifan Yang, Dongqi Han, Dongsheng Li,
- Abstract summary: We develop a deeper understanding of training-free guidance, as well as overcome its limitations.
To elucidate their drawbacks, we theoretically demonstrate that training-free guidance is more susceptible to adversarial gradients.
We then introduce a collection of techniques designed to overcome the limitations, accompanied by theoretical rationale and empirical evidence.
- Score: 46.803127794547756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adding additional control to pretrained diffusion models has become an increasingly popular research area, with extensive applications in computer vision, reinforcement learning, and AI for science. Recently, several studies have proposed training-free loss-based guidance by using off-the-shelf networks pretrained on clean images. This approach enables zero-shot conditional generation for universal control formats, which appears to offer a free lunch in diffusion guidance. In this paper, we aim to develop a deeper understanding of training-free guidance, as well as overcome its limitations. We offer a theoretical analysis that supports training-free guidance from the perspective of optimization, distinguishing it from classifier-based (or classifier-free) guidance. To elucidate their drawbacks, we theoretically demonstrate that training-free guidance is more susceptible to adversarial gradients and exhibits slower convergence rates compared to classifier guidance. We then introduce a collection of techniques designed to overcome the limitations, accompanied by theoretical rationale and empirical evidence. Our experiments in image and motion generation confirm the efficacy of these techniques.
Related papers
- Navigating Sparse Molecular Data with Stein Diffusion Guidance [48.21071466968102]
optimal control (SOC) has emerged as a principled framework for fine-tuning diffusion models.<n>A class of training-free approaches has been developed that guides diffusion models using off-the-shelf classifiers on predicted clean samples.<n>We propose a novel training-free guidance framework based on a surrogate optimal control objective.
arXiv Detail & Related papers (2025-07-07T21:14:27Z) - How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models [57.42800112251644]
We propose Step AG, which is a simple, universally applicable adaptive guidance strategy.<n>Our evaluations focus on both image quality and image-text alignment.
arXiv Detail & Related papers (2025-06-10T02:09:48Z) - Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better [1.1970409518725493]
Adversarial training (AT) is an effective technique for enhancing adversarial robustness, but it comes at the cost of a decline in generalization ability.
Recent studies have attempted to use clean training to assist adversarial training, yet there are contradictions among the conclusions.
We propose a new idea of leveraging clean training to further improve the performance of advanced AT methods.
arXiv Detail & Related papers (2025-03-30T15:58:41Z) - Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective [100.54185280153753]
We find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries.
We propose a generic postprocessing step built upon flow-matching to shrink the gap between the learned distribution for a pretrained denoising diffusion model and the real data distribution.
arXiv Detail & Related papers (2025-03-13T17:59:59Z) - Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning [93.58897637077001]
This paper tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints.
We pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos.
For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model.
arXiv Detail & Related papers (2025-03-11T13:50:22Z) - REG: Rectified Gradient Guidance for Conditional Diffusion Models [16.275782069986253]
We propose rectified gradient guidance (REG) to boost the performance of existing guidance methods.
REG provides a better approximation to the optimal solution than prior guidance techniques.
In experiments on class-conditional ImageNet and text-to-image generation tasks, REG consistently improves FID and Inception/CLIP scores.
arXiv Detail & Related papers (2025-01-31T03:16:18Z) - Dreamguider: Improved Training free Diffusion-based Conditional Generation [31.68823843900196]
Dreamguider is a method that enables inference-time guidance without compute-heavy backpropagation through the diffusion network.
We present experiments using Dreamguider on multiple tasks across multiple datasets and models to show the effectiveness of the proposed modules.
arXiv Detail & Related papers (2024-06-04T17:59:32Z) - Manifold Preserving Guided Diffusion [121.97907811212123]
Conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training.
We propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework.
arXiv Detail & Related papers (2023-11-28T02:08:06Z) - Elucidating The Design Space of Classifier-Guided Diffusion Generation [17.704873767509557]
We show that it is possible to achieve significant performance improvements over existing guidance schemes by leveraging off-the-shelf classifiers in a training-free fashion.
Our proposed approach has great potential and can be readily scaled up to text-to-image generation tasks.
arXiv Detail & Related papers (2023-10-17T14:34:58Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Convergence of End-to-End Training in Deep Unsupervised Contrastive
Learning [3.8073142980733]
Unsupervised contrastive learning has proven to be a powerful method for learning representations from unlabeled data.
This study provides theoretical insights into the practical success of these unsupervised methods.
arXiv Detail & Related papers (2020-02-17T14:35:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.