Related papers: PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models

PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models

URL: http://arxiv.org/abs/2508.13021v2
Date: Tue, 19 Aug 2025 02:03:36 GMT
Title: PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models
Authors: Pengcheng Huang, Shuhao Liu, Zhenghao Liu, Yukun Yan, Shuo Wang, Zulong Chen, Tong Xiao,
Abstract summary: Masked diffusion models (MDMs) are powerful non-autoregressive alternatives for sequence generation.<n>In this work, we introduce Position-Aware Confidence-Calibrated Sampling (PC-Sampler), a novel decoding strategy.<n>PC-Sampler consistently outperforms existing MDM decoding strategies by more than 10% on average.
Score: 33.98279129315148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in masked diffusion models (MDMs) have established them as powerful non-autoregressive alternatives for sequence generation. Nevertheless, our preliminary experiments reveal that the generation quality of MDMs is still highly sensitive to the choice of decoding strategy. In particular, widely adopted uncertainty-based samplers suffer from two key limitations: a lack of global trajectory control and a pronounced bias toward trivial tokens in the early stages of decoding. These shortcomings restrict the full potential of MDMs. In this work, we introduce Position-Aware Confidence-Calibrated Sampling (PC-Sampler), a novel decoding strategy that unifies global trajectory planning with content-aware informativeness maximization. PC-Sampler incorporates a position-aware weighting mechanism to regulate the decoding path and a calibrated confidence score to suppress the premature selection of trivial tokens. Extensive experiments on three advanced MDMs across seven challenging benchmarks-including logical reasoning and planning tasks-demonstrate that PC-Sampler consistently outperforms existing MDM decoding strategies by more than 10% on average, significantly narrowing the performance gap with state-of-the-art autoregressive models. All codes are available at https://github.com/NEUIR/PC-Sampler.

Related papers

Improving Sampling for Masked Diffusion Models via Information Gain [9.059619122219502]
Masked Diffusion Models (MDMs) offer greater flexibility in decoding order than autoregressive models.<n> Existing samplers typically adopt greedys, prioritizing positions with the highest local certainty to decode at each step.<n>We propose the Info-Gain Sampler, a principled decoding framework that balances immediate uncertainty with information gain.
arXiv Detail & Related papers (2026-02-20T12:26:03Z)
Learn from Your Mistakes: Self-Correcting Masked Diffusion Models [31.536464269884103]
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models.<n>We propose a framework that trains a model to perform both unmasking and correction.<n>We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence.
arXiv Detail & Related papers (2026-02-12T05:17:31Z)
Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z)
Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty [16.454646094266703]
Masked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge.<n>We are the first to formalize this issue, attributing variability in output quality to the cumulative predictive uncertainty along a generative path.<n>Our work establishes Denoising Entropy as a principled tool for understanding and controlling generation, effectively turning the uncertainty in MDMs from a liability into a key advantage for discovering high-quality solutions.
arXiv Detail & Related papers (2025-12-24T18:59:51Z)
Fine-Tuning Masked Diffusion for Provable Self-Correction [28.338622227684453]
Masked Diffusion Models (MDMs) have emerged as a promising approach for generative modeling in discrete spaces.<n>We introduce PRISM--Plug-in Remasking for Inference-time Self-correction of Masked Diffusions.
arXiv Detail & Related papers (2025-10-01T19:15:25Z)
Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs [44.55861996331439]
Masked diffusion models (MDMs) offer a promising non-autoregressive alternative for large language modeling.<n>Standard decoding methods for MDMs select tokens independently based on individual token confidences at each diffusion step.<n>We propose Reward-Weighted Sampling (RWS) to provide a principled global signal during the iterative diffusion process.
arXiv Detail & Related papers (2025-08-31T05:48:30Z)
Accelerating Diffusion LLMs via Adaptive Parallel Decoding [50.9948753314669]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z)
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking [17.511240770486452]
Masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling.<n>We introduce EB-Sampler, a drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure.<n> EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance.
arXiv Detail & Related papers (2025-05-30T17:52:55Z)
Test-Time Alignment for Large Language Models via Textual Model Predictive Control [63.508812485566374]
Textual Model Predictive Control (TMPC) is a novel predictive planning framework adapted for aligning Large Language Models at inference time.<n>TMPC is evaluated on three tasks with distinct segmentation properties: discourse-level translation, long-form response generation, and program synthesis.<n>Results demonstrate that TMPC consistently improves performance, highlighting the generality.
arXiv Detail & Related papers (2025-02-28T07:24:33Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions. Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z)
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds. We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context. We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.