Related papers: Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

URL: http://arxiv.org/abs/2511.05563v1
Date: Tue, 04 Nov 2025 02:37:37 GMT
Title: Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models
Authors: Sanghyun Lee, Seungryong Kim, Jongho Park, Dongmin Park,
Abstract summary: Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance depends on the inference time order of unmasking.<n>We propose Lookahead Unmasking (LookUM), which addresses these concerns by reformulating sampling as path selection over all possible unmasking orders.<n>LookUM requires only two to three paths to achieve peak performance, demonstrating remarkably efficient path selection.
Score: 51.12873073612084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance crucially depends on the inference time order of unmasking. Prevailing heuristics, such as confidence based sampling, are myopic: they optimize locally, fail to leverage extra test-time compute, and let early decoding mistakes cascade. We propose Lookahead Unmasking (LookUM), which addresses these concerns by reformulating sampling as path selection over all possible unmasking orders without the need for an external reward model. Our framework couples (i) a path generator that proposes paths by sampling from pools of unmasking sets with (ii) a verifier that computes the uncertainty of the proposed paths and performs importance sampling to subsequently select the final paths. Empirically, erroneous unmasking measurably inflates sequence level uncertainty, and our method exploits this to avoid error-prone trajectories. We validate our framework across six benchmarks, such as mathematics, planning, and coding, and demonstrate consistent performance improvements. LookUM requires only two to three paths to achieve peak performance, demonstrating remarkably efficient path selection. The consistent improvements on both LLaDA and post-trained LLaDA 1.5 are particularly striking: base LLaDA with LookUM rivals the performance of RL-tuned LLaDA 1.5, while LookUM further enhances LLaDA 1.5 itself showing that uncertainty based verification provides orthogonal benefits to reinforcement learning and underscoring the versatility of our framework. Code will be publicly released.

Related papers

Learn from Your Mistakes: Self-Correcting Masked Diffusion Models [31.536464269884103]
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models.<n>We propose a framework that trains a model to perform both unmasking and correction.<n>We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence.
arXiv Detail & Related papers (2026-02-12T05:17:31Z)
Lookahead Path Likelihood Optimization for Diffusion LLMs [31.01208893976334]
We introduce path log-likelihood (Path LL), a trajectory-conditioned objective that strongly correlates with downstream accuracy and enables principled selection of unmasking paths.<n>To optimize Path LL at inference time, we propose POKE, an efficient value estimator that predicts the expected future Path LL of a partial decoding trajectory.<n>We then integrate this lookahead signal into POKE-SMC, a Sequential Monte Carlo-based search framework for dynamically identifying optimal unmasking paths.
arXiv Detail & Related papers (2026-02-03T13:12:41Z)
Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z)
Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z)
dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning [36.12942468805232]
Masked diffusion language models offer potential for parallel token generation.<n>Open-source MDLMs decode fewer than 5 tokens per model forward pass.<n>dUltra learns unmasking strategies for efficient parallel decoding.
arXiv Detail & Related papers (2025-12-24T23:31:48Z)
Learning Unmasking Policies for Diffusion Language Models [33.44995119635116]
Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks.<n>One particularly successful variant is masked discrete diffusion, in which a buffer filled with special mask tokens is progressively replaced with tokens sampled from the model's vocabulary.<n>In this work, we propose to train sampling procedures using reinforcement learning.
arXiv Detail & Related papers (2025-12-09T20:44:33Z)
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model [98.35868970993232]
Diffusion language models (DLMs) are emerging as a powerful and promising alternative to the dominant autoregressive paradigm.<n>We introduce efficient Sampling with Adaptive acceleration and Backtracking Enhanced Remasking (i.e., Saber) to achieve better inference speed and output quality in code generation.
arXiv Detail & Related papers (2025-10-20T23:38:12Z)
Tree Reward-Aligned Search for TReASURe in Masked Diffusion Language Models [13.433506313486701]
Tree search has emerged as a powerful framework for aligning generative models with task-specific rewards at test time.<n>We propose TReASURe, a tree-search test-time alignment method that addresses these issues.<n>TReASURe achieves state-of-the-art results on perplexity, linguistic acceptability, and control of sentiment and toxicity.
arXiv Detail & Related papers (2025-09-27T06:22:45Z)
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding [40.96405124314983]
Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs)<n>Currently available open-source dLLMs often generate at much lower rates, typically decoding only a single token at every denoising timestep.<n>We present Spiffy, a speculative decoding algorithm that accelerates dLLM inference by $mathbf2.8-3.1times$ while provably preserving the model's output distribution.
arXiv Detail & Related papers (2025-09-22T17:58:21Z)
Towards Better Code Generation: Adaptive Decoding with Uncertainty Guidance [42.737012213197865]
AdaDec is an adaptive decoding framework that employs a lookahead-based, uncertainty-aware pause-and-rerank mechanism.<n>AdaDec achieves up to 20.9% absolute gains in Pass@1 accuracy compared with greedy decoding.<n>By applying reranking only when necessary, AdaDec reduces computational overhead and latency, enhancing efficiency alongside reliability.
arXiv Detail & Related papers (2025-06-10T16:49:46Z)
Accelerating Diffusion LLMs via Adaptive Parallel Decoding [60.407727995313074]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z)
Teaching Your Models to Understand Code via Focal Preference Alignment [70.71693365502212]
In existing approaches, a set of n candidate solutions is evaluated based on test case success rates.<n>Because this approach aligns entire failing code blocks rather than pinpointing specific errors, it lacks the granularity necessary to capture meaningful error-correction relationships.<n>We propose Target-DPO, a new preference alignment framework that mimics human iterative debug to refine Code LLMs.
arXiv Detail & Related papers (2025-03-04T16:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.