Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models
- URL: http://arxiv.org/abs/2512.19004v1
- Date: Mon, 22 Dec 2025 03:45:04 GMT
- Title: Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models
- Authors: Tongyuan Miao, Gary Huang, Kai Jun Han, Annie Jiang,
- Abstract summary: Large Language Models (DLLMs) enable fully parallel token decoding but often remain impractical at inference time.<n>Most existing acceleration methods focus on traversing this generative trajectory more efficiently via improved solvers or sampling strategies.<n>We propose a training-free interface that injects prompt-conditioned priors from a lightweight auxiliary model into the diffusion initialization.<n>Because injected priors can be imperfect and unmask-only decoding can over-commit early, we also introduce a simple confidence-based remasking mechanism as a form of prior skepticism.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion Large Language Models (DLLMs) enable fully parallel token decoding but often remain impractical at inference time due to the many denoising iterations required to refine an information-free, fully masked initialization into coherent text. Most existing acceleration methods focus on traversing this generative trajectory more efficiently via improved solvers or sampling strategies. We advance a complementary perspective: shorten the trajectory itself by starting closer to the target distribution through context-aware initialization. We propose a training-free interface that injects prompt-conditioned priors from a lightweight auxiliary model into the diffusion initialization, and instantiate it with two mechanisms: discrete token injection and representation-level embedding interpolation. Because injected priors can be imperfect and unmask-only decoding can over-commit early, we also introduce a simple confidence-based remasking mechanism as a form of prior skepticism. Preliminary evidence on GSM8K suggests that context-aware initialization can substantially reduce denoising iterations (about 35\% fewer function evaluations in our setting), while also exposing a key open challenge: naive warm-starting can degrade final accuracy relative to strong diffusion baselines. We use these findings to motivate a research agenda around calibration, revision mechanisms, and representation alignment for reliable warm-started diffusion decoding.
Related papers
- DODO: Discrete OCR Diffusion Models [15.352694377412229]
We introduce DODO, the first VLM to utilize block discrete diffusion and unlock its speedup potential for OCR.<n>Our method achieves near state-of-the-art accuracy while enabling up to 3x faster inference compared to autoregressive baselines.
arXiv Detail & Related papers (2026-02-18T20:59:22Z) - Just on Time: Token-Level Early Stopping for Diffusion Language Models [0.0]
Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient.<n>We introduce a training-free, token-level early stopping approach that identifies convergence independently at each position.<n>This yields adaptive per-token freezing without task-specific fine-tuning, substantially reducing the total number of diffusion steps required.
arXiv Detail & Related papers (2026-02-11T18:44:04Z) - Causal Autoregressive Diffusion Language Model [70.7353007255797]
CARD reformulates the diffusion process within a strictly causal attention mask, enabling dense, per-token supervision in a single forward pass.<n>Our results demonstrate that CARD achieves ARM-level data efficiency while unlocking the latency benefits of parallel generation.
arXiv Detail & Related papers (2026-01-29T17:38:29Z) - Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models [1.3535770763481902]
Foundation models, despite their robust zero-shot capabilities, remain vulnerable to spurious correlations and 'Clever Hans' strategies.<n>We propose Visual Disentangled Diffusion Autoencoders (DiDAE), a novel framework integrating frozen foundation models with disentangled dictionary learning.<n>DiDAE first edits foundation model embeddings in interpretable disentangled directions of the disentangled dictionary and then decodes them via a diffusion autoencoder.
arXiv Detail & Related papers (2026-01-29T15:25:37Z) - Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - Adaptive 3D Reconstruction via Diffusion Priors and Forward Curvature-Matching Likelihood Updates [1.2425910171551517]
Reconstructing high-quality point clouds from images remains challenging in computer vision.<n>Recent diffusion-based methods have attempted to address this by combining prior models with likelihood updates.<n>We advance this line of approach by integrating our novel Forward Curvature-Matching (FCM) update method with diffusion sampling.
arXiv Detail & Related papers (2025-11-09T10:14:14Z) - TopoSizing: An LLM-aided Framework of Topology-based Understanding and Sizing for AMS Circuits [7.615431299673158]
Traditional black-box optimization achieves sampling efficiency but lacks circuit understanding.<n>We propose TopoSizing, an end-to-end framework that performs robust circuit understanding directly from raw netlists.
arXiv Detail & Related papers (2025-09-17T16:52:46Z) - A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE) [3.1775609005777024]
Large language models are computationally expensive due to their deep structures.<n>We propose SPADE, a novel decoding method that aligns intermediate layer representations with the output layer.<n>We create a hybrid early-exit algorithm that monitors confidence levels and stops inference at intermediate layers while using SPADE to generate high-quality outputs.
arXiv Detail & Related papers (2025-07-23T15:49:03Z) - FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion [22.207275433870937]
Diffusion language models offer parallel token generation and inherent bidirectionality.<n>State-of-the-art diffusion models (e.g., Dream 7B, LLaDA 8B) suffer from slow inference.<n>We introduce Guided Diffusion, a training-free method that uses a lightweight pretrained autoregressive model to supervise token unmasking.
arXiv Detail & Related papers (2025-05-27T17:39:39Z) - Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z) - Accelerating Large Language Model Inference with Self-Supervised Early Exits [0.0]
This paper presents a novel technique for accelerating inference in large, pre-trained language models (LLMs)
We propose the integration of early exit ''heads'' atop existing transformer layers, which facilitate conditional terminations based on a confidence metric.
arXiv Detail & Related papers (2024-07-30T07:58:28Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.