Related papers: Navigating with Annealing Guidance Scale in Diffusion Space

Navigating with Annealing Guidance Scale in Diffusion Space

URL: http://arxiv.org/abs/2506.24108v1
Date: Mon, 30 Jun 2025 17:55:00 GMT
Title: Navigating with Annealing Guidance Scale in Diffusion Space
Authors: Shai Yehezkel, Omer Dahary, Andrey Voynov, Daniel Cohen-Or,
Abstract summary: The choice of the guidance scale has a critical impact on the convergence toward a visually appealing and prompt-adherent image.<n>In this work, we propose an annealing guidance scheduler which dynamically adjusts the guidance scale over time.<n> Empirical results demonstrate that our guidance scheduler significantly enhances image quality and alignment with the text prompt.
Score: 50.53780111249146
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Denoising diffusion models excel at generating high-quality images conditioned on text prompts, yet their effectiveness heavily relies on careful guidance during the sampling process. Classifier-Free Guidance (CFG) provides a widely used mechanism for steering generation by setting the guidance scale, which balances image quality and prompt alignment. However, the choice of the guidance scale has a critical impact on the convergence toward a visually appealing and prompt-adherent image. In this work, we propose an annealing guidance scheduler which dynamically adjusts the guidance scale over time based on the conditional noisy signal. By learning a scheduling policy, our method addresses the temperamental behavior of CFG. Empirical results demonstrate that our guidance scheduler significantly enhances image quality and alignment with the text prompt, advancing the performance of text-to-image generation. Notably, our novel scheduler requires no additional activations or memory consumption, and can seamlessly replace the common classifier-free guidance, offering an improved trade-off between prompt alignment and quality.

Related papers

You Don't Need All That Attention: Surgical Memorization Mitigation in Text-to-Image Diffusion Models [8.429432661292964]
Generative models have been shown to "memorize" certain training data, leading to verbatim or near-verbatim generating images.<n>We introduce Guidance Using Attractive-Repulsive Dynamics (GUARD), a novel framework for memorization mitigation in text-to-image diffusion models.<n>GUARD adjusts the image denoising process to guide the generation away from an original training image and towards one that is distinct from training data.
arXiv Detail & Related papers (2026-02-23T17:20:40Z)
Dynamic Classifier-Free Diffusion Guidance via Online Feedback [53.54876309092376]
"One-size-all" approach fails to adapt to the diverse requirements of different prompts.<n>We introduce a framework for dynamic CFG scheduling.<n>We demonstrate the effectiveness of our approach on both small-scale models and the state-of-the-art Imagen 3.
arXiv Detail & Related papers (2025-09-19T16:27:19Z)
Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling [54.54513714247062]
Recent advancements in unified image generation models, such as OmniGen, have enabled the handling of diverse image generation and editing tasks within a single framework.<n>We found that it suffers from text instruction neglect, especially when the text instruction contains multiple sub-instructions.<n>We propose Self-Adaptive Attention Scaling to dynamically scale the attention activation for each sub-instruction.
arXiv Detail & Related papers (2025-07-22T05:25:38Z)
How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models [57.42800112251644]
We propose Step AG, which is a simple, universally applicable adaptive guidance strategy.<n>Our evaluations focus on both image quality and image-text alignment.
arXiv Detail & Related papers (2025-06-10T02:09:48Z)
Feedback Guidance of Diffusion Models [0.0]
Interval-Free Guidance (CFG) has become standard for improving sample fidelity in conditional diffusion models.<n>We propose FeedBack Guidance (FBG), which uses a state-dependent coefficient to self-regulate guidance amounts based on need.
arXiv Detail & Related papers (2025-06-06T13:46:32Z)
Text-to-Image Alignment in Denoising-Based Models through Step Selection [5.617018577548289]
Visual generative AI models often encounter challenges related to text-image alignment and reasoning limitations.<n>This paper presents a novel method for selectively enhancing the signal at critical denoising steps, optimizing image generation based on input semantics.
arXiv Detail & Related papers (2025-04-24T13:10:32Z)
Classifier-free Guidance with Adaptive Scaling [7.179513844921256]
Free guidance (CFG) is an essential mechanism in text-driven diffusion models.<n>In this paper, we present $beta$adaptive-CFG, which controls the impact of guidance during generation.<n>Our model obtained better FID scores, maintaining the text-to-image CLIP similarity scores at a level similar to that of the reference CFG.
arXiv Detail & Related papers (2025-02-14T22:04:53Z)
Adaptive Prompt Tuning: Vision Guided Prompt Tuning with Cross-Attention for Fine-Grained Few-Shot Learning [5.242869847419834]
Few-shot, fine-grained classification in computer vision poses significant challenges due to the need to differentiate subtle class distinctions with limited data.<n>This paper presents a novel method that enhances the Contrastive Language-Image Pre-Training model through adaptive prompt tuning.
arXiv Detail & Related papers (2024-12-19T08:51:01Z)
Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models. We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z)
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information. We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z)
End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method. It enables plug-and-play guidance by optimizing diffusion latents. It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.