Related papers: Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

URL: http://arxiv.org/abs/2602.19619v1
Date: Mon, 23 Feb 2026 09:06:13 GMT
Title: Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models
Authors: Luhan Tang, Longxuan Yu, Shaorong Zhang, Greg Ver Steeg,
Abstract summary: We introduce a sampler-centric oracle framework that replaces learned denoisers with an exact Hidden Markov Model posterior derived from a ground-truth Markov chain.<n>We show that few-step discrete diffusion samplers are not distributionally correct even under an oracle denoiser, with transition-level mismatch that vanishes only as the number of steps approaches the sequence length.
Score: 14.764619905977739
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Discrete diffusion language models (dLLMs) provide a fast and flexible alternative to autoregressive models (ARMs) via iterative denoising with parallel updates. However, their evaluation is challenging: existing metrics conflate denoiser approximation error with sampler-induced error from the sampling dynamics, a problem that does not arise for ARMs whose autoregressive sampling exactly reflects the learned probability model. We introduce a sampler-centric oracle framework that replaces learned denoisers with an exact Hidden Markov Model posterior derived from a ground-truth Markov chain, isolating sampler-induced error in a controlled setting. We show that few-step discrete diffusion samplers are not distributionally correct even under an oracle denoiser, with transition-level mismatch that vanishes only as the number of steps approaches the sequence length. Moreover, improvements in negative log-likelihood, generative perplexity, or MAUVE do not imply correct sampling. Code is available at https://luhantang.github.io/dllm_sampler

Related papers

Test-Time Scaling with Diffusion Language Models via Reward-Guided Stitching [66.39914384073145]
We propose a self-consistency framework that turns cheap diffusion-sampled reasoning into a reusable pool of step-level candidates.<n>We find that step-level recombination is most beneficial on harder problems.<n>Our training-free framework improves average accuracy by up to 2 across six math and coding tasks.
arXiv Detail & Related papers (2026-02-26T11:08:39Z)
Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z)
Corrected Samplers for Discrete Flow Models [36.348940136801296]
A line of recent work has studied samplers for discrete diffusion models, such as tau-leaping and Euler solver.<n>We establish non-asymptotic discretization error bounds for those samplers without any restriction on transition rates and source distributions.<n>We rigorously show that the location-corrected sampler has a lower complexity than existing parallel samplers.
arXiv Detail & Related papers (2026-01-30T03:53:22Z)
Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions [50.1404916337174]
We present the first large-scale, statistically powered audit of native probabilistic sampling in large language models (LLMs)<n>We show that batch generation achieves only modest statistical validity, with a 13% median pass rate, while independent requests collapse almost entirely.<n>We conclude that current LLMs lack a functional internal sampler, necessitating the use of external tools for applications requiring statistical guarantees.
arXiv Detail & Related papers (2026-01-08T22:33:12Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Flipping Against All Odds: Reducing LLM Coin Flip Bias via Verbalized Rejection Sampling [59.133428586090226]
Large language models (LLMs) can often accurately describe probability distributions using natural language.<n>This mismatch limits their use in tasks requiring reliableity, such as Monte Carlo methods, agent-based simulations, and randomized decision-making.<n>We introduce Verbalized Rejection Sampling (VRS), a natural-language adaptation of classical rejection sampling.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
DDB: Diffusion Driven Balancing to Address Spurious Correlations [24.940576844328408]
Deep neural networks trained with Empirical Risk Minimization often fail to generalize to out-of-distribution samples.<n>We propose a Diffusion Driven Balancing (DDB) technique to generate training samples with text-to-image diffusion models.<n>Our experiments show that our technique achieves better worst-group accuracy than the existing state-of-the-art methods.
arXiv Detail & Related papers (2025-03-21T15:28:22Z)
Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data.<n> generating high-quality outputs requires many discretization steps.<n>We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z)
Informed Correctors for Discrete Diffusion Models [27.295990499157814]
We propose a predictor-corrector sampling scheme for discrete diffusion models.<n>We show that our informed corrector consistently produces superior samples with fewer errors or improved FID scores.<n>Our results underscore the potential of informed correctors for fast and high-fidelity generation using discrete diffusion.
arXiv Detail & Related papers (2024-07-30T23:29:29Z)
Diffusion Rejection Sampling [13.945372555871414]
Diffusion Rejection Sampling (DiffRS) is a rejection sampling scheme that aligns the sampling transition kernels with the true ones at each timestep. The proposed method can be viewed as a mechanism that evaluates the quality of samples at each intermediate timestep and refines them with varying effort depending on the sample. Empirical results demonstrate the state-of-the-art performance of DiffRS on the benchmark datasets and the effectiveness of DiffRS for fast diffusion samplers and large-scale text-to-image diffusion models.
arXiv Detail & Related papers (2024-05-28T07:00:28Z)
UDPM: Upsampling Diffusion Probabilistic Models [33.51145642279836]
Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs generate high-quality samples from complex data distributions by defining an inverse process. Unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable. In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM)
arXiv Detail & Related papers (2023-05-25T17:25:14Z)
Improved Denoising Diffusion Probabilistic Models [4.919647298882951]
We show that DDPMs can achieve competitive log-likelihoods while maintaining high sample quality. We also find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes. We show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable.
arXiv Detail & Related papers (2021-02-18T23:44:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.