Related papers: The Diffusion Duality

The Diffusion Duality

URL: http://arxiv.org/abs/2506.10892v1
Date: Thu, 12 Jun 2025 16:55:35 GMT
Title: The Diffusion Duality
Authors: Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov,
Abstract summary: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion.<n>Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks.<n>We present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting.
Score: 11.823724329261053
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/duo

Related papers

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models [15.853201399662344]
Diffusion language models offer unique benefits over autoregressive models.<n>They lag in likelihood modeling and are limited to fixed-length generation.<n>We introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models.
arXiv Detail & Related papers (2025-03-12T17:43:40Z)
Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.<n>We generalize a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes.<n>Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality.
arXiv Detail & Related papers (2025-03-06T14:30:55Z)
TESS 2: A Large-Scale Generalist Diffusion Language Model [24.91689676432666]
TESS 2 is an instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models.<n>We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models.<n>We propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model.
arXiv Detail & Related papers (2025-02-19T17:50:31Z)
Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.<n>Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z)
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow [65.51671121528858]
Diffusion models have greatly improved visual generation but are hindered by slow generation speed due to the computationally intensive nature of solving generative ODEs. Rectified flow, a widely recognized solution, improves generation speed by straightening the ODE path. We propose Rectified Diffusion, which generalizes the design space and application scope of rectification to encompass the broader category of diffusion models.
arXiv Detail & Related papers (2024-10-09T17:43:38Z)
Accelerating Diffusion Models with One-to-Many Knowledge Distillation [35.130782477699704]
We introduce one-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models. Experiments on CIFAR10, LSUN Church, CelebA-HQ with DDPM and COCO30K with Stable Diffusion show that O2MKD can be applied to previous knowledge distillation and fast sampling methods to achieve significant acceleration.
arXiv Detail & Related papers (2024-10-05T15:10:04Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
Structural Pruning for Diffusion Models [65.02607075556742]
We present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones. Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method.
arXiv Detail & Related papers (2023-05-18T12:38:21Z)
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models. We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.