Related papers: The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum

The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum

URL: http://arxiv.org/abs/2602.21185v1
Date: Tue, 24 Feb 2026 18:35:22 GMT
Title: The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum
Authors: Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo,
Abstract summary: We introduce a family of Predictor-Corrector samplers for discrete diffusion.<n>When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling.<n>These findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling.
Score: 13.49715655470027
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2

Related papers

Scaling Beyond Masked Diffusion Language Models [18.68471174706656]
We present the first scaling law study of uniform-state and interpolating discrete diffusion methods.<n>We show that Masked diffusion models can be made approximately 12% more FLOPs-efficient when trained with a simple cross-entropy objective.
arXiv Detail & Related papers (2026-02-16T18:54:47Z)
TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics [40.75121059939763]
We introduce a new sampling method that is up to $186%$ faster than the current state of the art solver for comparative FID on ImageNet512.<n>The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples.
arXiv Detail & Related papers (2025-06-26T20:30:27Z)
The Diffusion Duality [24.39272541108744]
Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion.<n>Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks.<n>We present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting.
arXiv Detail & Related papers (2025-06-12T16:55:35Z)
Single-Step Consistent Diffusion Samplers [8.758218443992467]
Existing sampling algorithms typically require many iterative steps to produce high-quality samples.<n>We introduce consistent diffusion samplers, a new class of samplers designed to generate high-fidelity samples in a single step.<n>We show that our approach yields high-fidelity samples using less than 1% of the network evaluations required by traditional diffusion samplers.
arXiv Detail & Related papers (2025-02-11T14:25:52Z)
Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations [53.180374639531145]
Self-Refining Diffusion Samplers (SRDS) retain sample quality and can improve latency at the cost of additional parallel compute.<n>We take inspiration from the Parareal algorithm, a popular numerical method for parallel-in-time integration of differential equations.
arXiv Detail & Related papers (2024-12-11T11:08:09Z)
Curriculum Direct Preference Optimization for Diffusion and Consistency Models [110.08057135882356]
We propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation.<n>Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on nine benchmarks.
arXiv Detail & Related papers (2024-05-22T13:36:48Z)
Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data [74.2507346810066]
Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data.
arXiv Detail & Related papers (2024-03-20T14:22:12Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
UDPM: Upsampling Diffusion Probabilistic Models [33.51145642279836]
Denoising Diffusion Probabilistic Models (DDPM) have recently gained significant attention. DDPMs generate high-quality samples from complex data distributions by defining an inverse process. Unlike generative adversarial networks (GANs), the latent space of diffusion models is less interpretable. In this work, we propose to generalize the denoising diffusion process into an Upsampling Diffusion Probabilistic Model (UDPM)
arXiv Detail & Related papers (2023-05-25T17:25:14Z)
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models. We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z)
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech [63.780196620966905]
We propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms. ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU.
arXiv Detail & Related papers (2022-07-13T17:45:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.