Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty
- URL: http://arxiv.org/abs/2512.21336v1
- Date: Wed, 24 Dec 2025 18:59:51 GMT
- Title: Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty
- Authors: Ziyu Chen, Xinbei Jiang, Peng Sun, Tao Lin,
- Abstract summary: Masked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge.<n>We are the first to formalize this issue, attributing variability in output quality to the cumulative predictive uncertainty along a generative path.<n>Our work establishes Denoising Entropy as a principled tool for understanding and controlling generation, effectively turning the uncertainty in MDMs from a liability into a key advantage for discovering high-quality solutions.
- Score: 16.454646094266703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge: final output quality is highly sensitive to the decoding order. We are the first to formalize this issue, attributing the variability in output quality to the cumulative predictive uncertainty along a generative path. To quantify this uncertainty, we introduce Denoising Entropy, a computable metric that serves as an internal signal for evaluating generative process. Leveraging this metric, we propose two algorithms designed to optimize the decoding path: a post-hoc selection method and a real-time guidance strategy. Experiments demonstrate that our entropy-guided methods significantly improve generation quality, consistently boosting accuracy on challenging reasoning, planning, and code benchmarks. Our work establishes Denoising Entropy as a principled tool for understanding and controlling generation, effectively turning the uncertainty in MDMs from a liability into a key advantage for discovering high-quality solutions.
Related papers
- Improving Sampling for Masked Diffusion Models via Information Gain [9.059619122219502]
Masked Diffusion Models (MDMs) offer greater flexibility in decoding order than autoregressive models.<n> Existing samplers typically adopt greedys, prioritizing positions with the highest local certainty to decode at each step.<n>We propose the Info-Gain Sampler, a principled decoding framework that balances immediate uncertainty with information gain.
arXiv Detail & Related papers (2026-02-20T12:26:03Z) - Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models [64.92045568376705]
Coherent Contextual Decoding (CCD) is a novel inference framework built upon two core innovations.<n>CCD employs a trajectory rectification mechanism that leverages historical context to enhance sequence coherence.<n>Instead of rigid allocations based on diffusion steps, we introduce an adaptive sampling strategy that dynamically adjusts the unmasking budget for each step.
arXiv Detail & Related papers (2025-11-26T09:49:48Z) - Confidence-Modulated Speculative Decoding for Large Language Models [0.0]
This paper proposes an information-theoretic framework for speculative decoding based on confidence-modulated drafting.<n> Experiments on machine translation and summarization tasks demonstrate significant speedups over standard speculative decoding.
arXiv Detail & Related papers (2025-08-21T09:06:31Z) - Towards Better Code Generation: Adaptive Decoding with Uncertainty Guidance [42.737012213197865]
AdaDec is an adaptive decoding framework that employs a lookahead-based, uncertainty-aware pause-and-rerank mechanism.<n>AdaDec achieves up to 20.9% absolute gains in Pass@1 accuracy compared with greedy decoding.<n>By applying reranking only when necessary, AdaDec reduces computational overhead and latency, enhancing efficiency alongside reliability.
arXiv Detail & Related papers (2025-06-10T16:49:46Z) - Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation [0.0]
We explore uncertainty estimation as a proxy for correctness in LLM-generated code.<n>We adapt two state-of-the-art techniques from natural language generation to the domain of code generation.<n>Our findings indicate a strong correlation between the uncertainty computed through these techniques and correctness.
arXiv Detail & Related papers (2025-02-17T10:03:01Z) - Auto-Prompt Generation is Not Robust: Prompt Optimization Driven by Pseudo Gradient [50.15090865963094]
We introduce PertBench, a comprehensive benchmark dataset that includes a wide range of input perturbations.<n>Our analysis reveals substantial vulnerabilities in existing prompt generation strategies.<n>We propose PGO, a gradient-free prompt generation framework that leverages perturbation types as pseudo-gradient signals.
arXiv Detail & Related papers (2024-12-24T06:05:08Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design.
We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z) - Distributionally Robust Bayesian Optimization [121.71766171427433]
We present a novel distributionally robust Bayesian optimization algorithm (DRBO) for zeroth-order, noisy optimization.
Our algorithm provably obtains sub-linear robust regret in various settings.
We demonstrate the robust performance of our method on both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2020-02-20T22:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.