Diffusion Language Models are Provably Optimal Parallel Samplers
- URL: http://arxiv.org/abs/2512.25014v1
- Date: Wed, 31 Dec 2025 18:03:05 GMT
- Title: Diffusion Language Models are Provably Optimal Parallel Samplers
- Authors: Haozhe Jiang, Nika Haghtalab, Lijie Chen,
- Abstract summary: Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models.<n>We show that DLMs augmented with a chain-of-thought can simulate any parallel sampling algorithm using an optimal number of sequential steps.
- Score: 15.981424915336001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models for faster inference via parallel token generation. We provide a rigorous foundation for this advantage by formalizing a model of parallel sampling and showing that DLMs augmented with polynomial-length chain-of-thought (CoT) can simulate any parallel sampling algorithm using an optimal number of sequential steps. Consequently, whenever a target distribution can be generated using a small number of sequential steps, a DLM can be used to generate the distribution using the same number of optimal sequential steps. However, without the ability to modify previously revealed tokens, DLMs with CoT can still incur large intermediate footprints. We prove that enabling remasking (converting unmasked tokens to masks) or revision (converting unmasked tokens to other unmasked tokens) together with CoT further allows DLMs to simulate any parallel sampling algorithm with optimal space complexity. We further justify the advantage of revision by establishing a strict expressivity gap: DLMs with revision or remasking are strictly more expressive than those without. Our results not only provide a theoretical justification for the promise of DLMs as the most efficient parallel sampler, but also advocate for enabling revision in DLMs.
Related papers
- Divide and Conquer: Accelerating Diffusion-Based Large Language Models via Adaptive Parallel Decoding [6.755667885643806]
Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks.<n>We introduce an adaptive parallel decoding approach, namely DiCo, which features a three-phase divide-and-conquer paradigm.<n>Extensive experiments demonstrate that DiCo can achieve significant inference speedups while maintaining competitive generation quality.
arXiv Detail & Related papers (2026-02-27T08:36:06Z) - Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z) - Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models [46.151072011636444]
EvoToken-DLM is a novel diffusion-based language modeling approach that replaces hard binary masks with evolving soft token distributions.<n>EvoToken-DLM consistently achieves superior performance, outperforming strong diffusion-based and masked DLM baselines.
arXiv Detail & Related papers (2026-01-12T09:25:14Z) - dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning [36.12942468805232]
Masked diffusion language models offer potential for parallel token generation.<n>Open-source MDLMs decode fewer than 5 tokens per model forward pass.<n>dUltra learns unmasking strategies for efficient parallel decoding.
arXiv Detail & Related papers (2025-12-24T23:31:48Z) - CDLM: Consistency Diffusion Language Models For Faster Sampling [54.886467592798]
Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference.<n>We introduce CDLM, a training-based acceleration method that simultaneously tackles both bottlenecks.<n>Experiments show CDLM achieves 3.6x-14.5x lower latency while maintaining competitive accuracy on math and coding tasks.
arXiv Detail & Related papers (2025-11-24T16:21:25Z) - Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models [54.81955614221652]
parallel decoding, which enables simultaneous token updates, conflicts with the causal order often required for rigorous reasoning.<n> Behavioral analyses in both simple and complex reasoning tasks show thatDLLMs exhibit genuine parallelism only for directly decidable outputs.<n>We propose several practical mitigations, parallel-oriented prompting, diffusion early stopping, and parallel scaling, to reduce PSC-induced ineffectiveness and inefficiencies.
arXiv Detail & Related papers (2025-10-10T16:58:14Z) - Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models [82.87985794856803]
Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks.<n>Recently, Diffusion Language Models (DLMs) have emerged as a promising alternative architecture.
arXiv Detail & Related papers (2025-10-05T10:50:52Z) - Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models [8.407364705777587]
We introduce Free Draft-and-Verification (FreeDave), a novel fast decoding algorithm tailored forDLLMs.<n>FreeDave is proven to boost the inference throughput up to $3.78times$ without performance degradation.
arXiv Detail & Related papers (2025-09-30T21:28:04Z) - Accelerating Diffusion LLMs via Adaptive Parallel Decoding [60.407727995313074]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z) - Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions [32.48588058887852]
Insertion Language Models (ILMs) learn to insert tokens at arbitrary positions in a sequence.<n>ILMs can represent strong dependencies between tokens, and their ability to generate sequences in arbitrary order allows them to accurately model sequences.
arXiv Detail & Related papers (2025-05-09T03:29:15Z) - Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding [55.2480439325792]
In arbitrary-order language models, it is an open question how to sample tokens in parallel from the correct joint distribution.<n>We find that a different class of models, any-subset autoregressive models (AS-ARMs), holds the solution.<n>We show that AS-ARMs achieve state-of-the-art performance among sub-200M parameter models on infilling benchmark tasks, and nearly match the performance of models 50X larger on code generation.
arXiv Detail & Related papers (2025-04-29T06:33:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.