Related papers: Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models

Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models

URL: http://arxiv.org/abs/2507.08965v1
Date: Fri, 11 Jul 2025 18:48:29 GMT
Title: Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models
Authors: Kevin Rojas, Ye He, Chieh-Hsin Lai, Yuta Takida, Yuki Mitsufuji, Molei Tao,
Abstract summary: This paper theoretically analyzes CFG in the context of masked discrete diffusion.<n>High guidance early in sampling (when inputs are heavily masked) harms generation quality, while late-stage guidance has a larger effect.<n>Our method smoothens the transport between the data distribution and the initial (masked/uniform) distribution, which results in improved sample quality.
Score: 24.186262549509102
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Classifier-Free Guidance (CFG) is a widely used technique for conditional generation and improving sample quality in continuous diffusion models, and recent works have extended it to discrete diffusion. This paper theoretically analyzes CFG in the context of masked discrete diffusion, focusing on the role of guidance schedules. Our analysis shows that high guidance early in sampling (when inputs are heavily masked) harms generation quality, while late-stage guidance has a larger effect. These findings provide a theoretical explanation for empirical observations in recent studies on guidance schedules. The analysis also reveals an imperfection of the current CFG implementations. These implementations can unintentionally cause imbalanced transitions, such as unmasking too rapidly during the early stages of generation, which degrades the quality of the resulting samples. To address this, we draw insight from the analysis and propose a novel classifier-free guidance mechanism empirically applicable to any discrete diffusion. Intuitively, our method smoothens the transport between the data distribution and the initial (masked/uniform) distribution, which results in improved sample quality. Remarkably, our method is achievable via a simple one-line code change. The efficacy of our method is empirically demonstrated with experiments on ImageNet (masked discrete diffusion) and QM9 (uniform discrete diffusion).

Related papers

Navigating Sparse Molecular Data with Stein Diffusion Guidance [48.21071466968102]
optimal control (SOC) has emerged as a principled framework for fine-tuning diffusion models.<n>A class of training-free approaches has been developed that guides diffusion models using off-the-shelf classifiers on predicted clean samples.<n>We propose a novel training-free guidance framework based on a surrogate optimal control objective.
arXiv Detail & Related papers (2025-07-07T21:14:27Z)
Adaptive Destruction Processes for Diffusion Samplers [12.446080077998834]
This paper explores the challenges and benefits of a trainable destruction process in diffusion samplers.<n>We show that, when the number of steps is limited, training both generation and destruction processes results in faster convergence and improved sampling quality.
arXiv Detail & Related papers (2025-06-02T11:07:27Z)
Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z)
Improved off-policy training of diffusion samplers [93.66433483772055]
We study the problem of training diffusion models to sample from a distribution with an unnormalized density or energy function.<n>We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods.<n>Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work.
arXiv Detail & Related papers (2024-02-07T18:51:49Z)
Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts. We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep. We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z)
Fair Sampling in Diffusion Models through Switching Mechanism [5.560136885815622]
We propose a fairness-aware sampling method called textitattribute switching mechanism for diffusion models. We mathematically prove and experimentally demonstrate the effectiveness of the proposed method on two key aspects.
arXiv Detail & Related papers (2024-01-06T06:55:26Z)
Manifold Preserving Guided Diffusion [121.97907811212123]
Conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. We propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework.
arXiv Detail & Related papers (2023-11-28T02:08:06Z)
Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation. They still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z)
How Much is Enough? A Study on Diffusion Times in Score-based Generative Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution. We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
Diffusion-GAN: Training GANs with Diffusion [135.24433011977874]
Generative adversarial networks (GANs) are challenging to train stably. We propose Diffusion-GAN, a novel GAN framework that leverages a forward diffusion chain to generate instance noise. We show that Diffusion-GAN can produce more realistic images with higher stability and data efficiency than state-of-the-art GANs.
arXiv Detail & Related papers (2022-06-05T20:45:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.