Related papers: Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models

Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models

URL: http://arxiv.org/abs/2512.19004v1
Date: Mon, 22 Dec 2025 03:45:04 GMT
Title: Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models
Authors: Tongyuan Miao, Gary Huang, Kai Jun Han, Annie Jiang,
Abstract summary: Large Language Models (DLLMs) enable fully parallel token decoding but often remain impractical at inference time.<n>Most existing acceleration methods focus on traversing this generative trajectory more efficiently via improved solvers or sampling strategies.<n>We propose a training-free interface that injects prompt-conditioned priors from a lightweight auxiliary model into the diffusion initialization.<n>Because injected priors can be imperfect and unmask-only decoding can over-commit early, we also introduce a simple confidence-based remasking mechanism as a form of prior skepticism.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion Large Language Models (DLLMs) enable fully parallel token decoding but often remain impractical at inference time due to the many denoising iterations required to refine an information-free, fully masked initialization into coherent text. Most existing acceleration methods focus on traversing this generative trajectory more efficiently via improved solvers or sampling strategies. We advance a complementary perspective: shorten the trajectory itself by starting closer to the target distribution through context-aware initialization. We propose a training-free interface that injects prompt-conditioned priors from a lightweight auxiliary model into the diffusion initialization, and instantiate it with two mechanisms: discrete token injection and representation-level embedding interpolation. Because injected priors can be imperfect and unmask-only decoding can over-commit early, we also introduce a simple confidence-based remasking mechanism as a form of prior skepticism. Preliminary evidence on GSM8K suggests that context-aware initialization can substantially reduce denoising iterations (about 35\% fewer function evaluations in our setting), while also exposing a key open challenge: naive warm-starting can degrade final accuracy relative to strong diffusion baselines. We use these findings to motivate a research agenda around calibration, revision mechanisms, and representation alignment for reliable warm-started diffusion decoding.

Related papers

DODO: Discrete OCR Diffusion Models [15.352694377412229]
We introduce DODO, the first VLM to utilize block discrete diffusion and unlock its speedup potential for OCR.<n>Our method achieves near state-of-the-art accuracy while enabling up to 3x faster inference compared to autoregressive baselines.
arXiv Detail & Related papers (2026-02-18T20:59:22Z)
Just on Time: Token-Level Early Stopping for Diffusion Language Models [0.0]
Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient.<n>We introduce a training-free, token-level early stopping approach that identifies convergence independently at each position.<n>This yields adaptive per-token freezing without task-specific fine-tuning, substantially reducing the total number of diffusion steps required.
arXiv Detail & Related papers (2026-02-11T18:44:04Z)
Causal Autoregressive Diffusion Language Model [70.7353007255797]
CARD reformulates the diffusion process within a strictly causal attention mask, enabling dense, per-token supervision in a single forward pass.<n>Our results demonstrate that CARD achieves ARM-level data efficiency while unlocking the latency benefits of parallel generation.
arXiv Detail & Related papers (2026-01-29T17:38:29Z)
Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models [1.3535770763481902]
Foundation models, despite their robust zero-shot capabilities, remain vulnerable to spurious correlations and 'Clever Hans' strategies.<n>We propose Visual Disentangled Diffusion Autoencoders (DiDAE), a novel framework integrating frozen foundation models with disentangled dictionary learning.<n>DiDAE first edits foundation model embeddings in interpretable disentangled directions of the disentangled dictionary and then decodes them via a diffusion autoencoder.
arXiv Detail & Related papers (2026-01-29T15:25:37Z)
Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z)
Adaptive 3D Reconstruction via Diffusion Priors and Forward Curvature-Matching Likelihood Updates [1.2425910171551517]
Reconstructing high-quality point clouds from images remains challenging in computer vision.<n>Recent diffusion-based methods have attempted to address this by combining prior models with likelihood updates.<n>We advance this line of approach by integrating our novel Forward Curvature-Matching (FCM) update method with diffusion sampling.
arXiv Detail & Related papers (2025-11-09T10:14:14Z)
TopoSizing: An LLM-aided Framework of Topology-based Understanding and Sizing for AMS Circuits [7.615431299673158]
Traditional black-box optimization achieves sampling efficiency but lacks circuit understanding.<n>We propose TopoSizing, an end-to-end framework that performs robust circuit understanding directly from raw netlists.
arXiv Detail & Related papers (2025-09-17T16:52:46Z)
A Hybrid Early-Exit Algorithm for Large Language Models Based on Space Alignment Decoding (SPADE) [3.1775609005777024]
Large language models are computationally expensive due to their deep structures.<n>We propose SPADE, a novel decoding method that aligns intermediate layer representations with the output layer.<n>We create a hybrid early-exit algorithm that monitors confidence levels and stops inference at intermediate layers while using SPADE to generate high-quality outputs.
arXiv Detail & Related papers (2025-07-23T15:49:03Z)
FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion [22.207275433870937]
Diffusion language models offer parallel token generation and inherent bidirectionality.<n>State-of-the-art diffusion models (e.g., Dream 7B, LLaDA 8B) suffer from slow inference.<n>We introduce Guided Diffusion, a training-free method that uses a lightweight pretrained autoregressive model to supervise token unmasking.
arXiv Detail & Related papers (2025-05-27T17:39:39Z)
Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z)
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
Accelerating Large Language Model Inference with Self-Supervised Early Exits [0.0]
This paper presents a novel technique for accelerating inference in large, pre-trained language models (LLMs) We propose the integration of early exit ''heads'' atop existing transformer layers, which facilitate conditional terminations based on a confidence metric.
arXiv Detail & Related papers (2024-07-30T07:58:28Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.