Related papers: Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference

Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference

URL: http://arxiv.org/abs/2602.23968v1
Date: Fri, 27 Feb 2026 12:26:19 GMT
Title: Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference
Authors: David Fox, Sam Bowyer, Song Liu, Laurence Aitchison, Raul Santos-Rodriguez, Mengyue Yang,
Abstract summary: Masked discrete diffusion models (MDMs) are a promising new approach to generative modelling.<n>We propose a variational inference framework for learning parallel generation orders for MDMs.<n>Our method achieves 33.1% accuracy with an average of only 4 generation steps, compared to 23.7-29.0% accuracy achieved by standard competitor methods in the same number of steps.
Score: 19.909302863724758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked discrete diffusion models (MDMs) are a promising new approach to generative modelling, offering the ability for parallel token generation and therefore greater efficiency than autoregressive counterparts. However, achieving an optimal balance between parallel generation and sample quality remains an open problem. Current approaches primarily address this issue through fixed, heuristic parallel sampling methods. There exist some recent learning based approaches to this problem, but its formulation from the perspective of variational inference remains underexplored. In this work, we propose a variational inference framework for learning parallel generation orders for MDMs. As part of our method, we propose a parameterisation for the approximate posterior of generation orders which facilitates parallelism and efficient sampling during training. Using this method, we conduct preliminary experiments on the GSM8K dataset, where our method performs competitively against heuristic sampling strategies in the regime of highly parallel generation. For example, our method achieves 33.1\% accuracy with an average of only only 4 generation steps, compared to 23.7-29.0\% accuracy achieved by standard competitor methods in the same number of steps. We believe further experiments and analysis of the method will yield valuable insights into the problem of parallel generation with MDMs.

Related papers

Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z)
Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model [74.99242687133408]
Masked Diffusion Models (MDMs) have shown promising potential across vision, language, and cross-modal generation.<n>We introduce Co-GRPO, which reformulates MDM generation as a unified Markov Decision Process (MDP) that jointly incorporates both the model and the inference schedule.
arXiv Detail & Related papers (2025-12-25T12:06:04Z)
Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z)
Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations [53.180374639531145]
Self-Refining Diffusion Samplers (SRDS) retain sample quality and can improve latency at the cost of additional parallel compute.<n>We take inspiration from the Parareal algorithm, a popular numerical method for parallel-in-time integration of differential equations.
arXiv Detail & Related papers (2024-12-11T11:08:09Z)
Aioli: A Unified Optimization Framework for Language Model Data Mixing [74.50480703834508]
We show that no existing method consistently outperforms a simple stratified sampling baseline in terms of average test perplexity.<n>We derive a new online method named Aioli, which directly estimates the mixing law parameters throughout training and uses them to dynamically adjust proportions.
arXiv Detail & Related papers (2024-11-08T17:50:24Z)
SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems [14.2814208019426]
Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set.<n>We state three conditions for achieving measurement-consistent diffusion trajectories.<n>We propose a new optimization-based sampling method that not only enforces standard data manifold measurement consistency and forward diffusion consistency, but also incorporates our proposed step-wise and network-regularized backward diffusion consistency.
arXiv Detail & Related papers (2024-10-06T13:39:36Z)
Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling [22.256068524699472]
In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. We combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.
arXiv Detail & Related papers (2024-08-13T08:09:05Z)
Representation and De-interleaving of Mixtures of Hidden Markov Processes [3.7348616912887445]
De-interleaving of mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. This paper proposes a novel representation model and corresponding de-interleaving methods for the mixtures of HMPs.
arXiv Detail & Related papers (2024-06-01T12:24:23Z)
Learning Sampling Distributions for Model Predictive Control [36.82905770866734]
Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC. We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution. Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
arXiv Detail & Related papers (2022-12-05T20:35:36Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.