Related papers: Self-Speculative Masked Diffusions

Self-Speculative Masked Diffusions

URL: http://arxiv.org/abs/2510.03929v1
Date: Sat, 04 Oct 2025 20:16:38 GMT
Title: Self-Speculative Masked Diffusions
Authors: Andrew Campbell, Valentin De Bortoli, Jiaxin Shi, Arnaud Doucet,
Abstract summary: We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data.<n>We reduce the computational burden by generating non-factorized predictions over masked positions.<n>We apply our method to GPT2 scale text modelling and protein sequences generation, finding that we can achieve a 2x reduction in the required number of network forward passes.
Score: 46.04054227238148
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then sampled, however, the factorization approximation means that sampling too many positions in one go leads to poor sample quality. As a result, many simulation steps and therefore neural network function evaluations are required to generate high-quality data. We reduce the computational burden by generating non-factorized predictions over masked positions. This is achieved by modifying the final transformer attention mask from non-causal to causal, enabling draft token generation and parallel validation via a novel, model-integrated speculative sampling mechanism. This results in a non-factorized predictive distribution over masked positions in a single forward pass. We apply our method to GPT2 scale text modelling and protein sequences generation, finding that we can achieve a ~2x reduction in the required number of network forward passes relative to standard masked diffusion models.

Related papers

Learn from Your Mistakes: Self-Correcting Masked Diffusion Models [31.536464269884103]
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models.<n>We propose a framework that trains a model to perform both unmasking and correction.<n>We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence.
arXiv Detail & Related papers (2026-02-12T05:17:31Z)
A Random Matrix Theory of Masked Self-Supervised Regression [16.836043197411378]
We show how training aggregates predictions across many masking patterns, giving rise to a joint, matrix-valued predictor.<n>This object encodes how coordinates condition on one another and poses new analytical challenges.<n>We identify structured regimes in which masked self-supervised learning provably outperforms PCA.
arXiv Detail & Related papers (2026-01-30T17:32:33Z)
Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion [41.409281069230325]
Masked diffusion models have shown promising performance in generating high-quality samples in a wide range of domains.<n>This paper theoretically analyzes the MaskGIT sampler for image modeling, revealing its implicit temperature sampling mechanism.<n>We introduce the "moment sampler," which employs a "choose-then-sample" approach by selecting unmasking positions before sampling tokens.
arXiv Detail & Related papers (2025-10-06T06:30:22Z)
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking [17.511240770486452]
Masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling.<n>We introduce EB-Sampler, a drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure.<n> EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance.
arXiv Detail & Related papers (2025-05-30T17:52:55Z)
Text Generation Beyond Discrete Token Sampling [74.06071135207635]
Mixture of Inputs (MoI) is a training-free method for autoregressive generation.<n>MoI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B.
arXiv Detail & Related papers (2025-05-20T18:41:46Z)
One-for-More: Continual Diffusion Model for Anomaly Detection [63.50488826645681]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images.<n>Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting''<n>We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling [47.82616476928464]
Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data.<n>We show that both training and sampling of MDMs are theoretically free from the time variable.<n>We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision.
arXiv Detail & Related papers (2024-09-04T17:48:19Z)
Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling. deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective. This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z)
MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation [31.648523213206595]
Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task. Conventional approaches have attempted to address the task via prototype learning, known as point estimation. We propose a novel approach, dubbed MaskDiff, which models the underlying conditional distribution of a binary mask.
arXiv Detail & Related papers (2023-03-09T08:24:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.