Related papers: Sharp Convergence Rates for Masked Diffusion Models

Sharp Convergence Rates for Masked Diffusion Models

URL: http://arxiv.org/abs/2602.22505v1
Date: Thu, 26 Feb 2026 00:47:51 GMT
Title: Sharp Convergence Rates for Masked Diffusion Models
Authors: Yuchen Liang, Zhiheng Tan, Ness Shroff, Yingbin Liang,
Abstract summary: We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
Score: 53.117058231393834
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, with masked (absorbing-rate) variants emerging as competitive alternatives to autoregressive models. Among existing samplers, the Euler method remains the standard choice in many applications, and more recently, the First-Hitting Sampler (FHS) has shown considerable promise for masked diffusion models. Despite their practical success, the theoretical understanding of these samplers remains limited. Existing analyses are conducted in Kullback-Leibler (KL) divergence, which often yields loose parameter dependencies and requires strong assumptions on score estimation. Moreover, these guarantees do not cover recently developed high-performance sampler of FHS. In this work, we first develop a direct total-variation (TV) based analysis for the Euler method that overcomes these limitations. Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees without requiring any surrogate initialization. Also for this setting, we provide the first convergence lower bound for the Euler sampler, establishing tightness with respect to both the data dimension $d$ and the target accuracy $\varepsilon$. Finally, we analyze the FHS sampler and show that it incurs no sampling error beyond that induced by score estimation, which we show to be tight with a matching lower error bound. Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS, which may be of independent interest.

Related papers

Bridge Matching Sampler: Scalable Sampling via Generalized Fixed-Point Diffusion Matching [38.70740405520393]
Bridge Matching Sampler (BMS) enables learning a transport map between arbitrary prior and target distributions with a single, scalable, and stable objective.<n>We demonstrate that our method enables sampling at unprecedented scales while preserving mode diversity, achieving state-of-the-art results on complex synthetic densities and high-dimensional molecular benchmarks.
arXiv Detail & Related papers (2026-02-28T08:00:38Z)
Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity [49.809923981964715]
Contaminated mixture of experts (MoE) is motivated by transfer learning methods where a pre-trained model, acting as a frozen expert, is integrated with an adapter model, functioning as a trainable expert, in order to learn a new task.<n>In this work, we characterize uniform convergence rates for estimating parameters under challenging settings where ground-truth parameters vary with the sample size.<n>We also establish corresponding minimax lower bounds to ensure that these rates are minimax optimal.
arXiv Detail & Related papers (2026-01-31T23:45:50Z)
Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees [70.88473359544084]
We introduce a new analytical approach for discrete diffusion models that removes the need for regularity assumptions.<n>For the standard $tau$-leaping method, we establish convergence guarantees in KL divergence that scale linearly with vocabulary size.<n>Our approach is also more broadly applicable: it provides the first convergence guarantees for other widely used samplers.
arXiv Detail & Related papers (2025-09-20T17:42:29Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers [49.1574468325115]
We introduce a unified convergence analysis framework for deterministic samplers. Our framework achieves iteration complexity of $tilde O(d2/epsilon)$. We also provide a detailed analysis of Denoising Implicit Diffusion Models (DDIM)-type samplers.
arXiv Detail & Related papers (2024-10-18T07:37:36Z)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers.<n>We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.<n>This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z)
KL Convergence Guarantees for Score diffusion models under minimal data assumptions [9.618473763561418]
A notable challenge persists in the form of a lack of comprehensive quantitative results. This article focuses on score diffusion models with fixed step size stemming from the Ornstein-Uhlenbeck semigroup and its kinetic counterpart.
arXiv Detail & Related papers (2023-08-23T16:31:08Z)
Convergence of uncertainty estimates in Ensemble and Bayesian sparse model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator. We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.