Rejection Mixing: Fast Semantic Propagation of Mask Tokens for Efficient DLLM Inference
- URL: http://arxiv.org/abs/2602.22868v1
- Date: Thu, 26 Feb 2026 11:08:11 GMT
- Title: Rejection Mixing: Fast Semantic Propagation of Mask Tokens for Efficient DLLM Inference
- Authors: Yushi Ye, Feng Hong, Huangjie Zheng, Xu Chen, Zhiyong Chen, Yanfeng Wang, Jiangchao Yao,
- Abstract summary: DLLMs promise fast non-autoregressive inference but suffer a severe quality-speed trade-off in parallel decoding.<n>We address this by integrating continuous representations into the discrete decoding process, as they preserve rich inter-position dependency.<n>We propose ReMix, a framework that introduces a novel Continuous Mixing State as an intermediate between the initial masked state and the final decoded token state.
- Score: 58.189320101488725
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion Large Language Models (DLLMs) promise fast non-autoregressive inference but suffer a severe quality-speed trade-off in parallel decoding. This stems from the ''combinatorial contradiction'' phenomenon, where parallel tokens form semantically inconsistent combinations. We address this by integrating continuous representations into the discrete decoding process, as they preserve rich inter-position dependency. We propose ReMix (Rejection Mixing), a framework that introduces a novel Continuous Mixing State as an intermediate between the initial masked state and the final decoded token state. This intermediate state allows a token's representation to be iteratively refined in a continuous space, resolving mutual conflicts with other tokens before collapsing into a final discrete sample. Furthermore, a rejection rule reverts uncertain representations from the continuous state back to the masked state for reprocessing, ensuring stability and preventing error propagation. ReMix thus mitigates combinatorial contradictions by enabling continuous-space refinement during discrete diffusion decoding. Extensive experiments demonstrate that ReMix, as a training-free method, achieves a $2-8 \times$ inference speedup without any quality degradation.
Related papers
- Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding [28.23607623451461]
COVER performs leave-one-out verification and stable drafting within a single forward pass.<n>It balances uncertainty, downstream influence, and cache drift, and it adapts the number of verified seeds per step.<n>Across benchmarks, COVER reduces unnecessary revisions and yields faster decoding while preserving output quality.
arXiv Detail & Related papers (2026-02-05T19:58:48Z) - Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z) - Reversible Diffusion Decoding for Diffusion Language Models [69.10149777322108]
Reversible Diffusion Decoding (RDD) is a decoding framework that introduces reversibility into block-wise diffusion generation.<n>RDD detects stagnation as a state-dependent failure of the reverse process and enables efficient backtracking to earlier blocks.<n> Experiments show that RDD improves generation robustness and quality over baselines with minimal computational overhead.
arXiv Detail & Related papers (2026-01-29T12:52:33Z) - MixAR: Mixture Autoregressive Image Generation [12.846100277592969]
We introduce MixAR, a novel framework that injects discrete tokens as prior guidance for continuous autoregressive modeling.<n>We investigate several discrete-continuous mixture strategies, including self-attention (DC-SA), cross-attention (DC-CA), and a simple approach (DC-Mix) that replaces homogeneous mask tokens with informative discrete counterparts.
arXiv Detail & Related papers (2025-11-15T12:19:28Z) - MARS-Sep: Multimodal-Aligned Reinforced Sound Separation [72.85468563236005]
MARS-Sep is a reinforcement learning framework for sound separation.<n>It learns a factorized Beta mask policy that is optimized by a clipped trust-region surrogate.<n>Experiments on multiple benchmarks demonstrate consistent gains in Text-, Audio-, and Image-Queried separation.
arXiv Detail & Related papers (2025-10-12T09:05:28Z) - Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling [87.34677262370924]
Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token.<n>This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps.<n>We introduce Continuously Augmented Discrete Diffusion, a framework that augments the discrete state space with a paired diffusion in a continuous latent space.
arXiv Detail & Related papers (2025-10-01T18:00:56Z) - Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning [23.58934174168992]
Autoregressive (AR) language models generate text one token at a time, which limits their inference speed.<n>We propose Convolutional decoding (Conv), a normalization-based method that narrows the decoding window without hard segmentation.<n>We also introduce Rejecting Rule-based Fine-Tuning (R2FT), a post-hoc training scheme that better aligns tokens at positions far from context.
arXiv Detail & Related papers (2025-09-18T17:48:21Z) - Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding [60.06816407728172]
Discrete diffusion language models have shown strong potential for text generation.<n>Standard supervised fine-tuning misaligns with semi-autoregressive inference.<n>We propose Blockwise SFT, which partitions responses into fixed-size blocks.
arXiv Detail & Related papers (2025-08-27T02:49:33Z) - Continuous Speculative Decoding for Autoregressive Image Generation [27.308442169466975]
Continuous visual autoregressive (AR) models have demonstrated promising performance in image generation.<n> speculative decoding has effectively accelerated discrete autoregressive inference.<n>This work addresses challenges from low acceptance rate, inconsistent output distribution, and modified distribution without analytic expression.
arXiv Detail & Related papers (2024-11-18T09:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.