Related papers: What Exactly Does Guidance Do in Masked Discrete Diffusion Models

What Exactly Does Guidance Do in Masked Discrete Diffusion Models

URL: http://arxiv.org/abs/2506.10971v1
Date: Thu, 12 Jun 2025 17:59:19 GMT
Title: What Exactly Does Guidance Do in Masked Discrete Diffusion Models
Authors: He Ye, Rojas Kevin, Tao Molei,
Abstract summary: We show that when the full data distribution is a mixture over classes, guidance amplifies class-specific regions while suppresses regions shared with other classes.<n>Our findings highlight the role of guidance, not just in shaping the output distribution, but also in controlling the dynamics of the sampling trajectory.
Score: 1.283555556182245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study masked discrete diffusion models with classifier-free guidance (CFG). Assuming no score error nor discretization error, we derive an explicit solution to the guided reverse dynamics, so that how guidance influences the sampling behavior can be precisely characterized. When the full data distribution is a mixture over classes and the goal is to sample from a specific class, guidance amplifies class-specific regions while suppresses regions shared with other classes. This effect depends on the guidance strength $w$ and induces distinct covariance structures in the sampled distribution. Notably, we observe quantitatively different behaviors in $1$D and $2$D. We also show that for large $w$, the decay rate of the total variation ($\mathrm{TV}$) along the reverse dynamics is double-exponential in $w$ for both $1$D and $2$D. These findings highlight the role of guidance, not just in shaping the output distribution, but also in controlling the dynamics of the sampling trajectory. Our theoretical analysis is supported by experiments that illustrate the geometric effects of guidance and its impact on convergence.

Related papers

Diffusion Model's Generalization Can Be Characterized by Inductive Biases toward a Data-Dependent Ridge Manifold [19.059115911590776]
We explicitly characterize what diffusion model generates, by proposing a log-density ridge manifold.<n>We show how the generated data relate to this manifold as inference dynamics progresses.<n>More detailed understanding of training dynamics will lead to more accurate quantification of the generation inductive bias.
arXiv Detail & Related papers (2026-02-05T18:55:03Z)
Emergence of Distortions in High-Dimensional Guided Diffusion Models [11.774563966512707]
We formalize the phenomenon of generative distortion defined as the mismatch between the CFG-induced sampling and the true conditional distribution.<n>We show that standard CFG schedules are incapable of preventing variance shrinkage.<n>We propose a theoretically motivated guidance schedule featuring a negative-guidance window, which mitigates loss of diversity while preserving class separability.
arXiv Detail & Related papers (2026-01-31T13:19:45Z)
Generalization Dynamics of Linear Diffusion Models [8.107431208836426]
We analytically study the memorisation-to-generalisation transition in a simple model using linear denoisers.<n>Our work clarifies how sample complexity governs generalisation in a simple model of diffusion-based generative models.
arXiv Detail & Related papers (2025-05-30T16:31:58Z)
Provable Efficiency of Guidance in Diffusion Models for General Data Distribution [7.237817437521988]
Diffusion models have emerged as a powerful framework for generative modeling.<n>Guidance techniques play a crucial role in enhancing sample quality.<n>Existing studies only focus on case studies, where the distribution conditioned on each class is either isotropic Gaussian or supported on a one-dimensional interval with some extra conditions.
arXiv Detail & Related papers (2025-05-02T16:46:43Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
What does guidance do? A fine-grained analysis in a simple setting [19.51972040691315]
We give a fine-grained characterization of the dynamics of guidance in two cases. We prove that for any nonzero level of score estimation error, sufficiently large guidance will result in sampling away from the support.
arXiv Detail & Related papers (2024-09-19T20:16:33Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z)
TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood. Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation. We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean? Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z)
Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs) Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z)
A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE. We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z)
On counterfactual inference with unobserved confounding [36.18241676876348]
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors. We derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality.
arXiv Detail & Related papers (2022-11-14T04:14:37Z)
A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning [113.75991721607174]
We introduce an interventional prediction module to estimate the probability of two estimated $hatz_i, hatz_j$ belonging to the same environment. We empirically show that $hatZ$ estimated by our method enjoy less redundant information than previous methods.
arXiv Detail & Related papers (2022-06-09T15:01:36Z)
$(f,\Gamma)$-Divergences: Interpolating between $f$-Divergences and Integral Probability Metrics [6.221019624345409]
We develop a framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs) We show that they can be expressed as a two-stage mass-redistribution/mass-transport process. Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions.
arXiv Detail & Related papers (2020-11-11T18:17:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.