Lookahead-then-Verify: Reliable Constrained Decoding for Diffusion LLMs under Context-Free Grammars
- URL: http://arxiv.org/abs/2602.00612v1
- Date: Sat, 31 Jan 2026 08:58:15 GMT
- Title: Lookahead-then-Verify: Reliable Constrained Decoding for Diffusion LLMs under Context-Free Grammars
- Authors: Yitong Zhang, Yongmin Li, Yuetong Liu, Jia Li, Xiaoran Jia, Zherui Li, Ge Li,
- Abstract summary: We present LAVE, a constrained decoding approach specifically designed for dLLMs.<n>Our approach leverages a key property of dLLMs, namely their ability to predict token distributions for all positions in parallel during each forward pass.<n>Extensive experiments across four widely used dLLMs and three representative benchmarks demonstrate that LAVE consistently outperforms existing baselines and achieves substantial improvements in syntactic correctness, while incurring negligible runtime overhead.
- Score: 17.13122301190815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Large Language Models (dLLMs) have demonstrated promising generative capabilities and are increasingly used to produce formal languages defined by context-free grammars, such as source code and chemical expressions. However, as probabilistic models, they still struggle to generate syntactically valid outputs reliably. A natural and promising direction to address this issue is to adapt constrained decoding techniques to enforce grammatical correctness during generation. However, applying these techniques faces two primary obstacles. On the one hand, the non-autoregressive nature of dLLMs renders most existing constrained decoding approaches inapplicable. On the other hand, current approaches specifically designed for dLLMs may allow intermediate outputs that are impossible to complete into valid sentences, which significantly limits their reliability in practice. To address these challenges, we present LAVE, a constrained decoding approach specifically designed for dLLMs. Our approach leverages a key property of dLLMs, namely their ability to predict token distributions for all positions in parallel during each forward pass. Whenever a new token is proposed by model, LAVE performs lookahead using these distributions to efficiently and reliably verify the validity of the proposed token. This design ensures reliable constraints by reliably preserving the potential for intermediate outputs to be extended into valid sentences. Extensive experiments across four widely used dLLMs and three representative benchmarks demonstrate that LAVE consistently outperforms existing baselines and achieves substantial improvements in syntactic correctness, while incurring negligible runtime overhead.
Related papers
- DiffuRank: Effective Document Reranking with Diffusion Language Models [71.16830004674513]
We propose DiffuRank, a reranking framework built upon diffusion language models (dLLMs)<n>dLLMs support more flexible decoding and generation processes that are not constrained to a left-to-right order.<n>We show dLLMs achieve performance comparable to, and in some cases exceeding, that of autoregressive LLMs with similar model sizes.
arXiv Detail & Related papers (2026-02-13T02:18:14Z) - Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z) - Residual Context Diffusion Language Models [90.07635240595926]
Residual Context Diffusion (RCD) is a module that converts discarded token representations into contextual residuals and injects them back for the next denoising step.<n>RCD consistently improves frontier dLLMs by 5-10 points in accuracy with minimal extra computation overhead.
arXiv Detail & Related papers (2026-01-30T13:16:32Z) - The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models [67.58848748317506]
Diffusion Large Language Models (dLLMs) break the rigid left-to-right constraint of traditional LLMs.<n>In this paper, we reveal a counter-intuitive reality: arbitrary order generation, in its current form, narrows rather than expands the reasoning boundary of dLLMs.
arXiv Detail & Related papers (2026-01-21T16:41:58Z) - Deferred Commitment Decoding for Diffusion Language Models with Confidence-Aware Sliding Windows [33.361153168706444]
We propose Deferred Commitment Decoding (DCD) as a training-free decoding strategy.<n>DCD maintains a confidence-aware sliding window over masked tokens, resolving low-uncertainty tokens early while deferring high-uncertainty tokens until sufficient contextual evidence becomes available.<n>Experiments show that DCD improves generation accuracy by 1.39% with comparable time on average compared to fixed block-based diffusion methods, with the most significant improvement reaching 9.0%.
arXiv Detail & Related papers (2026-01-05T12:57:33Z) - Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way [23.877854550033224]
Diffusion-based large language models (dLLMs) have exhibited substantial potential for parallel text generation.<n>Current dLLMs suffer from fixed generation lengths, which indicates the generation lengths of dLLMs have to be determined before decoding.<n>We propose to train a diffusion LLM with native variable generation lengths, abbreviated as dLLM-Var.
arXiv Detail & Related papers (2025-10-28T16:32:43Z) - Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models [47.5976588836299]
Diffusion large language models (dLLMs) offer advantages such as accelerated parallel decoding and bidirectional context modeling.<n>The vanilla decoding strategy in discrete dLLMs suffers from a critical limitation: once a token is accepted, it can no longer be revised in subsequent steps.<n>We propose Tolerator, a training-free decoding strategy that leverages cross-validation among predicted tokens.
arXiv Detail & Related papers (2025-10-06T17:56:46Z) - Constrained Decoding of Diffusion LLMs with Context-Free Grammars [1.0923877073891446]
Large language models (LLMs) have shown promising performance across diverse domains.<n>This paper presents the first constrained decoding method for diffusion models.<n>We show that our method achieves near-perfect syntactic correctness while consistently preserving or improving functional correctness.
arXiv Detail & Related papers (2025-08-13T18:09:09Z) - Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z) - Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion [55.0194604505437]
Speculative decoding has emerged as a widely adopted method to accelerate large language model inference.<n>This paper proposes an adaptation of speculative decoding which uses discrete diffusion models to generate draft sequences.
arXiv Detail & Related papers (2024-08-10T21:24:25Z) - Adaptive Draft-Verification for Efficient Large Language Model Decoding [24.347886232342862]
Large language model (LLM) decoding involves generating a sequence of tokens based on a given context.
The typical autoregressive decoding method requires a separate forward pass through the model for each token generated.
We introduce ADED, which accelerates LLM decoding without requiring fine-tuning.
arXiv Detail & Related papers (2024-06-27T22:20:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.