Error Bounds and Optimal Schedules for Masked Diffusions with Factorized Approximations
- URL: http://arxiv.org/abs/2510.25544v1
- Date: Wed, 29 Oct 2025 14:11:03 GMT
- Title: Error Bounds and Optimal Schedules for Masked Diffusions with Factorized Approximations
- Authors: Hugo Lavenant, Giacomo Zanella,
- Abstract summary: Recently proposed generative models for discrete data, such as Masked Diffusion Models (MDMs), exploit conditional independence approximations.<n>We study the resulting computation-vs-accuracy trade-off, providing general error bounds (in relative entropy)<n>We then investigate the gain obtained by using non-constant schedule sizes.
- Score: 3.595215303316358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently proposed generative models for discrete data, such as Masked Diffusion Models (MDMs), exploit conditional independence approximations to reduce the computational cost of popular Auto-Regressive Models (ARMs), at the price of some bias in the sampling distribution. We study the resulting computation-vs-accuracy trade-off, providing general error bounds (in relative entropy) that depend only on the average number of tokens generated per iteration and are independent of the data dimensionality (i.e. sequence length), thus supporting the empirical success of MDMs. We then investigate the gain obtained by using non-constant schedule sizes (i.e. varying the number of unmasked tokens during the generation process) and identify the optimal schedule as a function of a so-called information profile of the data distribution, thus allowing for a principled optimization of schedule sizes. We define methods directly as sampling algorithms and do not use classical derivations as time-reversed diffusion processes, leading us to simple and transparent proofs.
Related papers
- Fast Sampling for Flows and Diffusions with Lazy and Point Mass Stochastic Interpolants [5.492889521988414]
We prove how to convert a sample path of a differential equation (SDE) with arbitrary diffusion coefficient under any schedule.<n>We then extend the interpolant framework to admit a larger class of point mass schedules.
arXiv Detail & Related papers (2026-02-03T17:48:34Z) - An Elementary Approach to Scheduling in Generative Diffusion Models [55.171367482496755]
An elementary approach to characterizing the impact of noise scheduling and time discretization in generative diffusion models is developed.<n> Experiments across different datasets and pretrained models demonstrate that the time discretization strategy selected by our approach consistently outperforms baseline and search-based strategies.
arXiv Detail & Related papers (2026-01-20T05:06:26Z) - Dimension-free error estimate for diffusion model and optimal scheduling [22.20348860913421]
Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution.<n>Previous analyses have quantified the error between the generated and the true data distributions in terms of Wasserstein distance or Kullback-Leibler divergence.<n>In this work, we derive an explicit, dimension-free bound on the discrepancy between the generated and the true data distributions.
arXiv Detail & Related papers (2025-12-01T15:58:20Z) - Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking [17.511240770486452]
Masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling.<n>We introduce EB-Sampler, a drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure.<n> EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance.
arXiv Detail & Related papers (2025-05-30T17:52:55Z) - Enhanced Diffusion Sampling via Extrapolation with Multiple ODE Solutions [41.989657585518984]
Diffusion probabilistic models (DPMs) often suffer from high computational costs due to their iterative sampling process.<n>We propose an enhanced ODE-based sampling method for DPMs inspired by Richardson extrapolation, which reduces numerical error and improves convergence rates.
arXiv Detail & Related papers (2025-04-02T16:06:23Z) - Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z) - Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data.<n> generating high-quality outputs requires many discretization steps.<n>We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z) - Score-Optimal Diffusion Schedules [29.062842062257918]
An appropriate discretisation schedule is crucial to obtain high quality samples.<n>This paper presents a novel algorithm for adaptively selecting an optimal discretisation schedule.<n>We find that our learned schedule recovers performant schedules previously only discovered through manual search.
arXiv Detail & Related papers (2024-12-10T19:26:51Z) - Diffusion models for probabilistic programming [56.47577824219207]
Diffusion Model Variational Inference (DMVI) is a novel method for automated approximate inference in probabilistic programming languages (PPLs)
DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model.
arXiv Detail & Related papers (2023-11-01T12:17:05Z) - Diffusion Models with Deterministic Normalizing Flow Priors [21.24885597341643]
We propose DiNof ($textbfDi$ffusion with $textbfNo$rmalizing $textbff$low priors), a technique that makes use of normalizing flows and diffusion models.<n>Experiments on standard image generation datasets demonstrate the advantage of the proposed method over existing approaches.
arXiv Detail & Related papers (2023-09-03T21:26:56Z) - AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts.
This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs.
It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z) - On Calibrating Diffusion Probabilistic Models [78.75538484265292]
diffusion probabilistic models (DPMs) have achieved promising results in diverse generative tasks.
We propose a simple way for calibrating an arbitrary pretrained DPM, with which the score matching loss can be reduced and the lower bounds of model likelihood can be increased.
Our calibration method is performed only once and the resulting models can be used repeatedly for sampling.
arXiv Detail & Related papers (2023-02-21T14:14:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.