Planned Diffusion
- URL: http://arxiv.org/abs/2510.18087v1
- Date: Mon, 20 Oct 2025 20:27:48 GMT
- Title: Planned Diffusion
- Authors: Daniel Israel, Tian Jin, Ellie Cheng, Guy Van den Broeck, Aditya Grover, Suvinay Subramanian, Michael Carbin,
- Abstract summary: A central challenge in large language model inference is the trade-off between generation speed and output quality.<n>We propose planned diffusion, a hybrid method that combines the strengths of both paradigms.<n>Planned diffusion works in two stages: first, the model creates a short autoregressive plan that breaks the output into smaller, independent spans.
- Score: 57.74615417331808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A central challenge in large language model inference is the trade-off between generation speed and output quality. Autoregressive models produce high-quality text but generate tokens sequentially. Diffusion models can generate tokens in parallel but often need many iterations to match the same quality. We propose planned diffusion, a hybrid method that combines the strengths of both paradigms. Planned diffusion works in two stages: first, the model creates a short autoregressive plan that breaks the output into smaller, independent spans. Second, the model generates these spans simultaneously using diffusion. This approach expands the speed-quality Pareto frontier and provides a practical path to faster, high-quality text generation. On AlpacaEval, a suite of 805 instruction-following prompts, planned diffusion achieves Pareto-optimal trade-off between quality and latency, achieving 1.27x to 1.81x speedup over autoregressive generation with only 0.87\% to 5.4\% drop in win rate, respectively. Our sensitivity analysis shows that the planning mechanism of planned diffusion is minimal and reliable, and simple runtime knobs exist to provide flexible control of the quality-latency trade-off.
Related papers
- Scaling Beyond Masked Diffusion Language Models [18.68471174706656]
We present the first scaling law study of uniform-state and interpolating discrete diffusion methods.<n>We show that Masked diffusion models can be made approximately 12% more FLOPs-efficient when trained with a simple cross-entropy objective.
arXiv Detail & Related papers (2026-02-16T18:54:47Z) - Transition Matching Distillation for Fast Video Generation [63.1049790376783]
We present Transition Matching Distillation (TMD), a novel framework for distilling video diffusion models into efficient few-step generators.<n>TMD matches the multi-step denoising trajectory of a diffusion model with a few-step probability transition process.<n>TMD provides a flexible and strong trade-off between generation speed and visual quality.
arXiv Detail & Related papers (2026-01-14T21:30:03Z) - CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers [72.23291099555459]
Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
arXiv Detail & Related papers (2025-07-21T05:48:47Z) - Hybrid Autoregressive-Diffusion Model for Real-Time Sign Language Production [0.0]
We develop a hybrid approach that combines autoregressive and diffusion models for Sign Language Production (SLP)<n>To capture fine-grained body movements, we design a Multi-Scale Pose Representation module that separately extracts detailed features from distinct articulators.<n>We introduce a Confidence-Aware Causal Attention mechanism that utilizes joint-level confidence scores to dynamically guide the pose generation process.
arXiv Detail & Related papers (2025-07-12T01:34:50Z) - The Diffusion Duality [24.39272541108744]
Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion.<n>Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks.<n>We present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting.
arXiv Detail & Related papers (2025-06-12T16:55:35Z) - FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion [22.207275433870937]
Diffusion language models offer parallel token generation and inherent bidirectionality.<n>State-of-the-art diffusion models (e.g., Dream 7B, LLaDA 8B) suffer from slow inference.<n>We introduce Guided Diffusion, a training-free method that uses a lightweight pretrained autoregressive model to supervise token unmasking.
arXiv Detail & Related papers (2025-05-27T17:39:39Z) - Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.<n>We generalize a new family of general interpolating discrete diffusion (GIDD) which offers greater flexibility in the design of the noising processes.<n>Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise, leading to improved sample quality.
arXiv Detail & Related papers (2025-03-06T14:30:55Z) - Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule [50.260693393896716]
Diffusion models are cutting-edge generative models adept at producing diverse, high-quality images.<n>Recent techniques have been employed to automatically search for faster generation processes.<n>We introduce Flexiffusion, a novel training-free NAS paradigm designed to accelerate diffusion models.
arXiv Detail & Related papers (2024-09-26T06:28:05Z) - One-step Diffusion with Distribution Matching Distillation [54.723565605974294]
We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator.
We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence.
Our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k.
arXiv Detail & Related papers (2023-11-30T18:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.