Related papers: DUEL: Exact Likelihood for Masked Diffusion via Deterministic Unmasking

DUEL: Exact Likelihood for Masked Diffusion via Deterministic Unmasking

URL: http://arxiv.org/abs/2603.01367v1
Date: Mon, 02 Mar 2026 01:56:03 GMT
Title: DUEL: Exact Likelihood for Masked Diffusion via Deterministic Unmasking
Authors: Gilad Turok, Chris De Sa, Volodymyr Kuleshov,
Abstract summary: Masked diffusion models (MDMs) generate text by iteratively selecting positions to unmask and then predicting tokens at those positions.<n>Yet MDMs lack proper perplexity evaluation: the ELBO is a loose bound on likelihood under the training distribution, not the test-time distribution.<n>We introduce the textscDUEL framework, which formalizes emphdeterministic position selection, unifying leading MDM sampling strategies.
Score: 13.905201743303214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked diffusion models (MDMs) generate text by iteratively selecting positions to unmask and then predicting tokens at those positions. Yet MDMs lack proper perplexity evaluation: the ELBO is a loose bound on likelihood under the training distribution, not the test-time distribution, while generative perplexity requires a biased external model and ignores diversity. To address this, we introduce the \textsc{DUEL} framework, which formalizes \emph{deterministic} position selection, unifying leading MDM sampling strategies. We prove \textbf{\textsc{DUEL} admits \emph{exact} likelihood computation} via a simple algorithm, evaluated under the same position selection used at test time. This \textbf{gives MDMs proper perplexity for the first time} -- the natural analogue of autoregressive perplexity. With proper perplexity in hand, we revisit key questions about MDMs. \textbf{MDMs are substantially better than previously thought}: the MDM-autoregressive perplexity gap shrinks by up to 32\% on in-domain data and 82\% on zero-shot benchmarks. \textsc{DUEL} enables the first principled comparison of fast, parallel samplers across compute budgets -- an analysis impossible with the ELBO and unreliable with generative perplexity -- identifying probability margin \citep{kim2025train} as a strong default. Finally, oracle search over position orderings reveals MDMs can far surpass autoregressive models -- achieving 36.47 vs.\ 52.11 perplexity on AG News -- demonstrating the ceiling of MDM performance has not yet been reached.

Related papers

Improving Sampling for Masked Diffusion Models via Information Gain [9.059619122219502]
Masked Diffusion Models (MDMs) offer greater flexibility in decoding order than autoregressive models.<n> Existing samplers typically adopt greedys, prioritizing positions with the highest local certainty to decode at each step.<n>We propose the Info-Gain Sampler, a principled decoding framework that balances immediate uncertainty with information gain.
arXiv Detail & Related papers (2026-02-20T12:26:03Z)
Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model [74.99242687133408]
Masked Diffusion Models (MDMs) have shown promising potential across vision, language, and cross-modal generation.<n>We introduce Co-GRPO, which reformulates MDM generation as a unified Markov Decision Process (MDP) that jointly incorporates both the model and the inference schedule.
arXiv Detail & Related papers (2025-12-25T12:06:04Z)
MDiff4STR: Mask Diffusion Model for Scene Text Recognition [59.79818820650126]
Mask Diffusion Models (MDMs) have emerged as a promising alternative to auto-regressive models (ARMs) for vision-language tasks.<n>We show that vanilla MDM lags behind ARMs in terms of accuracy, although it improves recognition efficiency.<n>We propose MDiff4STR, a Mask Diffusion model enhanced with two key improvement strategies tailored for Scene Text Recognition.
arXiv Detail & Related papers (2025-12-01T08:57:51Z)
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing [4.707859580472452]
Masked diffusion models (MDMs) offer a compelling alternative to autoregressive models (ARMs) for discrete text generation.<n>They enable parallel token sampling, rather than sequential, left-to-right generation.<n>We present PUNT, a model-agnostic sampler that reconciles this trade-off.
arXiv Detail & Related papers (2025-10-24T18:41:26Z)
Fine-Tuning Masked Diffusion for Provable Self-Correction [28.338622227684453]
Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces.<n>We introduce PRISM--Plug-in Remasking for Inference-time Self-correction of Masked Diffusions.
arXiv Detail & Related papers (2025-10-01T19:15:25Z)
Any-Order Flexible Length Masked Diffusion [53.89217188409148]
Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains.<n>We introduce Flexible Masked Diffusion Models (FlexMDMs), a discrete diffusion paradigm that simultaneously can model sequences of flexible length.<n>We show that FlexMDMs match MDMs in perplexity while modeling length statistics with much higher fidelity.
arXiv Detail & Related papers (2025-08-31T23:34:53Z)
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking [17.511240770486452]
Masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling.<n>We introduce EB-Sampler, a drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure.<n> EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance.
arXiv Detail & Related papers (2025-05-30T17:52:55Z)
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness [61.87055159919641]
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities.<n>Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality.<n>We introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM)
arXiv Detail & Related papers (2025-03-24T08:46:52Z)
Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models. Recent studies extend the SAM to Few-shot Semantic segmentation (FSS) We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z)
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling [47.82616476928464]
Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data.<n>We show that both training and sampling of MDMs are theoretically free from the time variable.<n>We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision.
arXiv Detail & Related papers (2024-09-04T17:48:19Z)
Open-Domain Text Evaluation via Contrastive Distribution Methods [75.59039812868681]
We introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods. Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM's superior correlate with human judgment.
arXiv Detail & Related papers (2023-06-20T20:37:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.