Related papers: How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models

How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models

URL: http://arxiv.org/abs/2506.08351v1
Date: Tue, 10 Jun 2025 02:09:48 GMT
Title: How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models
Authors: Huixuan Zhang, Junzhe Zhang, Xiaojun Wan,
Abstract summary: We propose Step AG, which is a simple, universally applicable adaptive guidance strategy.<n>Our evaluations focus on both image quality and image-text alignment.
Score: 57.42800112251644
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: With the rapid development of text-to-vision generation diffusion models, classifier-free guidance has emerged as the most prevalent method for conditioning. However, this approach inherently requires twice as many steps for model forwarding compared to unconditional generation, resulting in significantly higher costs. While previous study has introduced the concept of adaptive guidance, it lacks solid analysis and empirical results, making previous method unable to be applied to general diffusion models. In this work, we present another perspective of applying adaptive guidance and propose Step AG, which is a simple, universally applicable adaptive guidance strategy. Our evaluations focus on both image quality and image-text alignment. whose results indicate that restricting classifier-free guidance to the first several denoising steps is sufficient for generating high-quality, well-conditioned images, achieving an average speedup of 20% to 30%. Such improvement is consistent across different settings such as inference steps, and various models including video generation models, highlighting the superiority of our method.

Related papers

RAAG: Ratio Aware Adaptive Guidance [7.2455669888408085]
We show that the earliest reverse steps are acutely sensitive to the guidance scale, owing to a pronounced spike in the relative strength (RATIO) of conditional to unconditional predictions.<n>We propose a simple, theoretically grounded, RATIO-aware adaptive guidance schedule that automatically dampens the guidance scale at early steps based on the evolving RATIO.<n>Our approach enables up to 3x faster sampling while maintaining or improving generation quality, robustness, and semantic alignment.
arXiv Detail & Related papers (2025-08-05T13:41:05Z)
Steering Guidance for Personalized Text-to-Image Diffusion Models [19.550718192994353]
Existing sampling guidance methods fail to guide the output toward well-balanced space.<n>We propose personalization guidance, a simple yet effective method leveraging an unlearned weak model conditioned on a null text prompt.<n>Our method explicitly steers the outputs toward a balanced latent space without additional computational overhead.
arXiv Detail & Related papers (2025-08-01T05:02:26Z)
Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model [62.11981915549919]
Domain Guidance is a transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain.<n>We demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FD$_textDINOv2$ compared to standard fine-tuning.
arXiv Detail & Related papers (2025-04-02T09:07:55Z)
Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models [0.0]
We present a novel method for improving text-to-image generation by combining Large Language Models with diffusion models.<n>Our approach incorporates semantic understanding from pre-trained LLMs to guide the generation process.<n>Our method significantly improves both the visual quality and alignment of generated images with text descriptions.
arXiv Detail & Related papers (2025-02-02T15:43:13Z)
Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference.<n>Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues.<n> Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
AdaDiff: Adaptive Step Selection for Fast Diffusion Models [82.78899138400435]
We introduce AdaDiff, a lightweight framework designed to learn instance-specific step usage policies.<n>AdaDiff is optimized using a policy method to maximize a carefully designed reward function.<n>We conduct experiments on three image generation and two video generation benchmarks and demonstrate that our approach achieves similar visual quality compared to the baseline.
arXiv Detail & Related papers (2023-11-24T11:20:38Z)
Elucidating The Design Space of Classifier-Guided Diffusion Generation [17.704873767509557]
We show that it is possible to achieve significant performance improvements over existing guidance schemes by leveraging off-the-shelf classifiers in a training-free fashion. Our proposed approach has great potential and can be readily scaled up to text-to-image generation tasks.
arXiv Detail & Related papers (2023-10-17T14:34:58Z)
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance [36.42984435784378]
Self-Attention Guidance (SAG) improves the performance of various diffusion models. SAG adversarially blurs only the regions that diffusion models attend to at each and guides them accordingly. Our results show that our SAG improves the performance of various diffusion models, including ADM, IDDPM, Stable Diffusion, and DiT.
arXiv Detail & Related papers (2022-10-03T13:50:58Z)
A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly. Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z)
Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles. We construct three propagative modules to effectively solve the optimization models with flexible combinations. Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.