Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
- URL: http://arxiv.org/abs/2509.22007v1
- Date: Fri, 26 Sep 2025 07:45:20 GMT
- Title: Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
- Authors: Cheng Jin, Qitan Shi, Yuantao Gu,
- Abstract summary: CFG is widely used to improve conditional fidelity in diffusion models, but its impact on sampling dynamics remains poorly understood.<n>We analyze CFG under multimodal conditionals and show that the sampling process unfolds in three successive stages.<n> Experiments support these predictions, showing that early strong guidance erodes global diversity, while late strong guidance suppresses fine-grained variation.
- Score: 13.030934039187171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifier-Free Guidance (CFG) is widely used to improve conditional fidelity in diffusion models, but its impact on sampling dynamics remains poorly understood. Prior studies, often restricted to unimodal conditional distributions or simplified cases, provide only a partial picture. We analyze CFG under multimodal conditionals and show that the sampling process unfolds in three successive stages. In the Direction Shift stage, guidance accelerates movement toward the weighted mean, introducing initialization bias and norm growth. In the Mode Separation stage, local dynamics remain largely neutral, but the inherited bias suppresses weaker modes, reducing global diversity. In the Concentration stage, guidance amplifies within-mode contraction, diminishing fine-grained variability. This unified view explains a widely observed phenomenon: stronger guidance improves semantic alignment but inevitably reduces diversity. Experiments support these predictions, showing that early strong guidance erodes global diversity, while late strong guidance suppresses fine-grained variation. Moreover, our theory naturally suggests a time-varying guidance schedule, and empirical results confirm that it consistently improves both quality and diversity.
Related papers
- Emergence of Distortions in High-Dimensional Guided Diffusion Models [11.774563966512707]
We formalize the phenomenon of generative distortion defined as the mismatch between the CFG-induced sampling and the true conditional distribution.<n>We show that standard CFG schedules are incapable of preventing variance shrinkage.<n>We propose a theoretically motivated guidance schedule featuring a negative-guidance window, which mitigates loss of diversity while preserving class separability.
arXiv Detail & Related papers (2026-01-31T13:19:45Z) - Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization [72.83292830785336]
Weight Averaging (WA) has emerged as a powerful technique for enhancing generalization by promoting convergence to a flat loss landscape.<n>We propose MBCD, a unified collaborative distillation framework that retains WA's flatness-inducing advantages while overcoming its shortcomings in multi-modal contexts.
arXiv Detail & Related papers (2025-11-25T12:38:28Z) - Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization [50.5332987313297]
We propose Token-Prompt embedding Space Optimization (TPSO), a training-free and model-agnostic module.<n>TPSO introduces learnable parameters to explore underrepresented regions of the token embedding space, reducing the tendency of the model to repeatedly generate samples from strong modes of the learned distribution.<n>In experiments on MS-COCO and three diffusion backbones, TPSO significantly enhances generative diversity, improving baseline performance from 1.10 to 4.18 points, without sacrificing image quality.
arXiv Detail & Related papers (2025-11-25T00:42:09Z) - Navigating the Exploration-Exploitation Tradeoff in Inference-Time Scaling of Diffusion Models [11.813933389519358]
Inference-time scaling has achieved remarkable success in language models, yet its adaptation to diffusion models remains underexplored.<n>We propose two strategies: Schedule and Adaptive Temperature.<n>Our methods significantly enhance sample quality without increasing the total number of Noise Evaluations.
arXiv Detail & Related papers (2025-08-17T13:35:38Z) - Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models [24.186262549509102]
This paper theoretically analyzes CFG in the context of masked discrete diffusion.<n>High guidance early in sampling (when inputs are heavily masked) harms generation quality, while late-stage guidance has a larger effect.<n>Our method smoothens the transport between the data distribution and the initial (masked/uniform) distribution, which results in improved sample quality.
arXiv Detail & Related papers (2025-07-11T18:48:29Z) - Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - Provable Efficiency of Guidance in Diffusion Models for General Data Distribution [7.237817437521988]
Diffusion models have emerged as a powerful framework for generative modeling.<n>Guidance techniques play a crucial role in enhancing sample quality.<n>Existing studies only focus on case studies, where the distribution conditioned on each class is either isotropic Gaussian or supported on a one-dimensional interval with some extra conditions.
arXiv Detail & Related papers (2025-05-02T16:46:43Z) - Contrastive CFG: Improving CFG in Diffusion Models by Contrasting Positive and Negative Concepts [55.298031232672734]
As-Free Guidance (CFG) has proven effective in conditional diffusion model sampling for improved condition alignment.
We present a novel method to enhance negative CFG guidance using contrastive loss.
arXiv Detail & Related papers (2024-11-26T03:29:27Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Spontaneous Symmetry Breaking in Generative Diffusion Models [6.4322891559626125]
Generative diffusion models have recently emerged as a leading approach for generating high-dimensional data.
We show that the dynamics of these models exhibit a spontaneous symmetry breaking that divides the generative dynamics into two distinct phases.
We propose a new way to understand the generative dynamics of diffusion models that has the potential to bring about higher performance and less biased fast-samplers.
arXiv Detail & Related papers (2023-05-31T09:36:34Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.