Related papers: KLASS: KL-Guided Fast Inference in Masked Diffusion Models

KLASS: KL-Guided Fast Inference in Masked Diffusion Models

URL: http://arxiv.org/abs/2511.05664v1
Date: Fri, 07 Nov 2025 19:05:36 GMT
Title: KLASS: KL-Guided Fast Inference in Masked Diffusion Models
Authors: Seo Hyun Kim, Sunwoo Hong, Hojung Jung, Youngrok Park, Se-Young Yun,
Abstract summary: Masked diffusion models have demonstrated competitive results on various tasks including language generation.<n>We introduce KL-Adaptive Stability Sampling' (KLASS), a fast yet effective sampling method that exploits token-level KL divergence to identify stable, high-confidence predictions.<n>On reasoning benchmarks, KLASS achieves up to $2.78times$ wall-clock speedups while improving performance over standard greedy decoding.
Score: 31.082562509845985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked diffusion models have demonstrated competitive results on various tasks including language generation. However, due to its iterative refinement process, the inference is often bottlenecked by slow and static sampling speed. To overcome this problem, we introduce `KL-Adaptive Stability Sampling' (KLASS), a fast yet effective sampling method that exploits token-level KL divergence to identify stable, high-confidence predictions. By unmasking multiple tokens in each iteration without any additional model training, our approach speeds up generation significantly while maintaining sample quality. On reasoning benchmarks, KLASS achieves up to $2.78\times$ wall-clock speedups while improving performance over standard greedy decoding, attaining state-of-the-art results among diffusion-based samplers. We further validate KLASS across diverse domains, including text, image, and molecular generation, showing its effectiveness as a broadly applicable sampler across different models.

Related papers

Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z)
Quantizing Diffusion Models from a Sampling-Aware Perspective [43.95032520555463]
We propose a sampling-aware quantization strategy, wherein a Mixed-Order Trajectory Alignment technique is devised.<n>Experiments on sparse-step fast sampling across multiple datasets demonstrate that our approach preserves the rapid convergence characteristics of high-speed samplers.
arXiv Detail & Related papers (2025-05-04T20:50:44Z)
COSEE: Consistency-Oriented Signal-Based Early Exiting via Calibrated Sample Weighting Mechanism [32.015402521706825]
Early exiting is an effective paradigm for improving the inference efficiency of pre-trained language models (PLMs)<n>We propose a novel Consistency-Oriented Signal-based Early Exiting (COSEE) framework, which leverages a calibrated sample weighting mechanism.<n>Experiments on the GLUE benchmark demonstrate the effectiveness of our COSEE across multiple exiting signals and backbones, yielding a better trade-off between performance and efficiency.
arXiv Detail & Related papers (2024-12-17T16:24:55Z)
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling [47.82616476928464]
Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data.<n>We show that both training and sampling of MDMs are theoretically free from the time variable.<n>We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision.
arXiv Detail & Related papers (2024-09-04T17:48:19Z)
Fast Sampling via Discrete Non-Markov Diffusion Models with Predetermined Transition Time [49.598085130313514]
We propose discrete non-Markov diffusion models (DNDM), which naturally induce the predetermined transition time set.<n>This enables a training-free sampling algorithm that significantly reduces the number of function evaluations.<n>We study the transition from finite to infinite step sampling, offering new insights into bridging the gap between discrete and continuous-time processes.
arXiv Detail & Related papers (2023-12-14T18:14:11Z)
DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space. We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process. Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z)
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning [52.72369034247396]
We propose the diffusion glancing transformer, which employs a modality diffusion process and residual glancing sampling. DIFFGLAT achieves better generation accuracy while maintaining fast decoding speed compared with both autoregressive and non-autoregressive models.
arXiv Detail & Related papers (2022-12-20T13:36:25Z)
Denoising Diffusion Implicit Models [117.03720513930335]
We present denoising diffusion implicit models (DDIMs) for iterative implicit probabilistic models with the same training procedure as DDPMs. DDIMs can produce high quality samples $10 times$ to $50 times$ faster in terms of wall-clock time compared to DDPMs.
arXiv Detail & Related papers (2020-10-06T06:15:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.