Related papers: Latent Shadows: The Gaussian-Discrete Duality in Masked Diffusion

Latent Shadows: The Gaussian-Discrete Duality in Masked Diffusion

URL: http://arxiv.org/abs/2602.00792v1
Date: Sat, 31 Jan 2026 16:00:46 GMT
Title: Latent Shadows: The Gaussian-Discrete Duality in Masked Diffusion
Authors: Guinan Chen, Xunpeng Huang, Ying Sun, Shijin Wang, Yanyong Zhang, Chao Wang,
Abstract summary: Masked discrete diffusion is a dominant paradigm for high-quality language modeling where tokens are iteratively corrupted to a mask state.<n>While diffusion duality enables deterministic distillation for uniform models, these approaches generally underperform masked models and rely on complex integral operators.<n>We introduce Masked Consistency Distillation (MCD), a principled framework that leverages this duality, bypassing numerical ODE solvers.
Score: 22.034770068249063
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Masked discrete diffusion is a dominant paradigm for high-quality language modeling where tokens are iteratively corrupted to a mask state, yet its inference efficiency is bottlenecked by the lack of deterministic sampling tools. While diffusion duality enables deterministic distillation for uniform models, these approaches generally underperform masked models and rely on complex integral operators. Conversely, in the masked domain, prior methods typically assume the absence of deterministic trajectories, forcing a reliance on stochastic distillation. To bridge this gap, we establish explicit Masked Diffusion Duality, proving that the masked process arises as the projection of a continuous Gaussian process via a novel maximum-value index preservation mechanism. Furthermore, we introduce Masked Consistency Distillation (MCD), a principled framework that leverages this duality to analytically construct the deterministic coupled trajectories required for consistency distillation, bypassing numerical ODE solvers. This result strictly improves upon prior stochastic distillation methods, achieving a 16$\times$ inference speedup without compromising generation quality. Our findings not only provide a solid theoretical foundation connecting masked and continuous diffusion, but also unlock the full potential of consistency distillation for high-performance discrete generation. Our code is available at https://anonymous.4open.science/r/MCD-70FD.

Related papers

Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction [45.25461515976432]
Plug-and-Play diffusion prior (DP) frameworks have emerged as a powerful paradigm for imaging reconstruction.<n>We present a novel approach to resolving bias-hallucination trade-off, achieving state-of-the-art gradients with significantly accelerated convergence.
arXiv Detail & Related papers (2026-02-26T16:58:43Z)
ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression [24.508523704467695]
Single-cell RNA-seq profiles are high-dimensional, sparse, and unordered, causing autoregressive generation to impose an artificial ordering bias.<n>We propose scDiVa, a masked discrete diffusion foundation model that aligns generation with the dropout-like corruption process.
arXiv Detail & Related papers (2026-02-03T12:50:29Z)
Breaking the Bottlenecks: Scalable Diffusion Models for 3D Molecular Generation [0.0]
Diffusion models have emerged as a powerful class of generative models for molecular design.<n>Their use remains constrained by long sampling trajectories, variance in the reverse process, and limited structural awareness in denoising dynamics.<n>The Directly Denoising Diffusion Model mitigates these inefficiencies by replacing reverse MCMC updates with deterministic denoising step.
arXiv Detail & Related papers (2026-01-13T20:09:44Z)
Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion [60.186310080523135]
Bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders development of truly unified multimodal systems.<n>We propose textbfCoM-DAD, a novel probabilistic framework that reformulates multimodal generation as a hierarchical dual-process.<n>Our method demonstrates superior stability over standard masked modeling, establishing a new paradigm for scalable, unified text-image generation.
arXiv Detail & Related papers (2026-01-07T16:21:19Z)
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling [87.34677262370924]
Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token.<n>This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps.<n>We introduce Continuously Augmented Discrete Diffusion, a framework that augments the discrete state space with a paired diffusion in a continuous latent space.
arXiv Detail & Related papers (2025-10-01T18:00:56Z)
On the Complexity Theory of Masked Discrete Diffusion: From $\mathrm{poly}(1/ε)$ to Nearly $ε$-Free [49.34727933066799]
Masked discrete diffusion is a flexible paradigm for text generation in which tokens are corrupted by special mask symbols before being denoised.<n>We show that Euler samplers can achieve $epsilon$-accuracy in total variation (TV) with $tildeO(d2epsilon-3/2)$ discrete score evaluations.<n>We then propose a Mask-Aware Truncated Uniformization (MATU) approach that both removes bounded-score assumptions and preserves unbiased discrete score approximation.
arXiv Detail & Related papers (2025-09-26T03:50:17Z)
Few-Step Diffusion via Score identity Distillation [67.07985339442703]
Diffusion distillation has emerged as a promising strategy for accelerating text-to-image (T2I) diffusion models.<n>Existing methods rely on real or teacher-synthesized images to perform well when distilling high-resolution T2I diffusion models.<n>We propose two new guidance strategies: Zero-CFG, which disables CFG in the teacher and removes text conditioning in the fake score network, and Anti-CFG, which applies negative CFG in the fake score network.
arXiv Detail & Related papers (2025-05-19T03:45:16Z)
One-for-More: Continual Diffusion Model for Anomaly Detection [63.50488826645681]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images.<n>Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting''<n>We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z)
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling [47.82616476928464]
Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data.<n>We show that both training and sampling of MDMs are theoretically free from the time variable.<n>We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision.
arXiv Detail & Related papers (2024-09-04T17:48:19Z)
DensePure: Understanding Diffusion Models towards Adversarial Robustness [110.84015494617528]
We analyze the properties of diffusion models and establish the conditions under which they can enhance certified robustness. We propose a new method DensePure, designed to improve the certified robustness of a pretrained model (i.e. a classifier) We show that this robust region is a union of multiple convex sets, and is potentially much larger than the robust regions identified in previous works.
arXiv Detail & Related papers (2022-11-01T08:18:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.