On the Complexity Theory of Masked Discrete Diffusion: From $\mathrm{poly}(1/ε)$ to Nearly $ε$-Free
- URL: http://arxiv.org/abs/2509.21835v1
- Date: Fri, 26 Sep 2025 03:50:17 GMT
- Title: On the Complexity Theory of Masked Discrete Diffusion: From $\mathrm{poly}(1/ε)$ to Nearly $ε$-Free
- Authors: Xunpeng Huang, Yingyu Lin, Nishant Jain, Kaibo Wang, Difan Zou, Yian Ma, Tong Zhang,
- Abstract summary: Masked discrete diffusion is a flexible paradigm for text generation in which tokens are corrupted by special mask symbols before being denoised.<n>We show that Euler samplers can achieve $epsilon$-accuracy in total variation (TV) with $tildeO(d2epsilon-3/2)$ discrete score evaluations.<n>We then propose a Mask-Aware Truncated Uniformization (MATU) approach that both removes bounded-score assumptions and preserves unbiased discrete score approximation.
- Score: 49.34727933066799
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study masked discrete diffusion -- a flexible paradigm for text generation in which tokens are progressively corrupted by special mask symbols before being denoised. Although this approach has demonstrated strong empirical performance, its theoretical complexity in high-dimensional settings remains insufficiently understood. Existing analyses largely focus on uniform discrete diffusion, and more recent attempts addressing masked diffusion either (1) overlook widely used Euler samplers, (2) impose restrictive bounded-score assumptions, or (3) fail to showcase the advantages of masked discrete diffusion over its uniform counterpart. To address this gap, we show that Euler samplers can achieve $\epsilon$-accuracy in total variation (TV) with $\tilde{O}(d^{2}\epsilon^{-3/2})$ discrete score evaluations, thereby providing the first rigorous analysis of typical Euler sampler in masked discrete diffusion. We then propose a Mask-Aware Truncated Uniformization (MATU) approach that both removes bounded-score assumptions and preserves unbiased discrete score approximation. By exploiting the property that each token can be unmasked at most once, MATU attains a nearly $\epsilon$-free complexity of $O(d\,\ln d\cdot (1-\epsilon^2))$. This result surpasses existing uniformization methods under uniform discrete diffusion, eliminating the $\ln(1/\epsilon)$ factor and substantially speeding up convergence. Our findings not only provide a rigorous theoretical foundation for masked discrete diffusion, showcasing its practical advantages over uniform diffusion for text generation, but also pave the way for future efforts to analyze diffusion-based language models developed under masking paradigm.
Related papers
- Sharp Convergence Rates for Masked Diffusion Models [53.117058231393834]
We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
arXiv Detail & Related papers (2026-02-26T00:47:51Z) - Efficient Sampling with Discrete Diffusion Models: Sharp and Adaptive Guarantees [9.180350432640912]
We study the sampling efficiency of score-based discrete diffusion models under a continuous-time Markov chain (CTMC) formulation.<n>For uniform discrete diffusion, we show that the $$-leaping algorithm achieves an complexity of order $tilde O(d/varepsilon)$.<n>For masking discrete diffusion, we introduce a modified $$-leaping sampler whose convergence rate is governed by an intrinsic information-theoretic quantity.
arXiv Detail & Related papers (2026-02-16T18:48:17Z) - Latent Shadows: The Gaussian-Discrete Duality in Masked Diffusion [22.034770068249063]
Masked discrete diffusion is a dominant paradigm for high-quality language modeling where tokens are iteratively corrupted to a mask state.<n>While diffusion duality enables deterministic distillation for uniform models, these approaches generally underperform masked models and rely on complex integral operators.<n>We introduce Masked Consistency Distillation (MCD), a principled framework that leverages this duality, bypassing numerical ODE solvers.
arXiv Detail & Related papers (2026-01-31T16:00:46Z) - Optimal Inference Schedules for Masked Diffusion Models [16.774584258255768]
Masked diffusion model (MDM) is able to sample tokens out-of-order and, ostensibly, many tokens at once and in parallel.<n>We show that it is in general impossible to compete with it without strong a priori knowledge of the distribution.
arXiv Detail & Related papers (2025-11-06T18:38:24Z) - Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees [70.88473359544084]
We introduce a new analytical approach for discrete diffusion models that removes the need for regularity assumptions.<n>For the standard $tau$-leaping method, we establish convergence guarantees in KL divergence that scale linearly with vocabulary size.<n>Our approach is also more broadly applicable: it provides the first convergence guarantees for other widely used samplers.
arXiv Detail & Related papers (2025-09-20T17:42:29Z) - Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.<n>We generalize a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes.<n>Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality.
arXiv Detail & Related papers (2025-03-06T14:30:55Z) - Generative diffusion for perceptron problems: statistical physics analysis and efficient algorithms [2.860608352191896]
We consider random instances of non- numerically weights perceptron problems in the high-dimensional limit.<n>We develop a formalism based on replica theory to predict Approximate sampling space using generative algorithms.
arXiv Detail & Related papers (2025-02-22T16:43:01Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)<n>We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Improving Self-supervised Pre-training via a Fully-Explored Masked
Language Model [57.77981008219654]
Masked Language Model (MLM) framework has been widely adopted for self-supervised language pre-training.
We propose a fully-explored masking strategy, where a text sequence is divided into a certain number of non-overlapping segments.
arXiv Detail & Related papers (2020-10-12T21:28:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.