Related papers: Stratified Hazard Sampling: Minimal-Variance Event Scheduling for CTMC/DTMC Discrete Diffusion and Flow Models

Stratified Hazard Sampling: Minimal-Variance Event Scheduling for CTMC/DTMC Discrete Diffusion and Flow Models

URL: http://arxiv.org/abs/2601.02799v1
Date: Tue, 06 Jan 2026 08:19:02 GMT
Title: Stratified Hazard Sampling: Minimal-Variance Event Scheduling for CTMC/DTMC Discrete Diffusion and Flow Models
Authors: Seunghwan Jang, SooJean Han,
Abstract summary: Stratified Hazard Sampling Sampling (SHS) models per-token edits as events driven by cumulative hazard (CTMC) or cumulative jump mass (DTMC)<n>SHS models per-token edits as events driven by cumulative hazard (CTMC) or cumulative jump mass (DTMC) and places events by stratifying this cumulative quantity.<n>We also introduce a phase-allocation variant for blacklist-style lexical constraints that prioritizes early edits at high-risk positions to mitigate late-masking artifacts.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: CTMC/DTMC-based discrete generative models, including uniform-noise discrete diffusion (e.g., D3PM/CTDD) and discrete flow matching, enable non-autoregressive sequence generation by repeatedly replacing tokens through a time-inhomogeneous Markov process. Inference is typically implemented with step-based simulation: each token decides to jump via independent Bernoulli (or categorical) draws at every discretization step. Under uniform-noise initialization, where self-correction requires multiple edits per position, these independent decisions induce substantial variance in both the number and timing of edits, leading to characteristic failure modes such as under-editing (residual noise) or over-editing (cascading unnecessary substitutions), decreasing reproducibility. We propose Stratified Hazard Sampling (SHS), a drop-in and hyperparameter-free inference principle for any sampler that admits a stay-vs.-replace decomposition. SHS models per-token edits as events driven by cumulative hazard (CTMC) or cumulative jump mass (DTMC) and places events by stratifying this cumulative quantity: with a single random phase per position, a token jumps whenever its accumulated hazard crosses unit-spaced thresholds. This preserves the expected number of jumps while achieving the minimum possible variance among unbiased integer estimators (bounded by 1/4), without altering per-jump destination sampling and thus retaining multimodality. We also introduce a phase-allocation variant for blacklist-style lexical constraints that prioritizes early edits at high-risk positions to mitigate late-masking artifacts.

Related papers

Rejection Mixing: Fast Semantic Propagation of Mask Tokens for Efficient DLLM Inference [58.189320101488725]
DLLMs promise fast non-autoregressive inference but suffer a severe quality-speed trade-off in parallel decoding.<n>We address this by integrating continuous representations into the discrete decoding process, as they preserve rich inter-position dependency.<n>We propose ReMix, a framework that introduces a novel Continuous Mixing State as an intermediate between the initial masked state and the final decoded token state.
arXiv Detail & Related papers (2026-02-26T11:08:11Z)
A Statistical Side-Channel Risk Model for Timing Variability in Lattice-Based Post-Quantum Cryptography [0.0]
Timing side-channels are an important threat to cryptography that still needs to be addressed in implementations.<n> lattice-based schemes may produce secret-dependent timing variability with the help of complex arithmetic and control flow.<n>A scenario-based statistical risk model is proposed for timing leakage as a problem of distributional distinguishability under controlled execution conditions.
arXiv Detail & Related papers (2025-12-26T03:12:33Z)
Divergence-Minimization for Latent-Structure Models: Monotone Operators, Contraction Guarantees, and Robust Inference [5.373905622325275]
We develop a divergence-minimization (DM) framework for robust and efficient inference in latent-mixture models.<n>By optimizing a residual-adjusted divergence, the DM approach recovers EM as a special case and yields robust alternatives.
arXiv Detail & Related papers (2025-11-22T08:25:29Z)
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling [87.34677262370924]
Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token.<n>This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps.<n>We introduce Continuously Augmented Discrete Diffusion, a framework that augments the discrete state space with a paired diffusion in a continuous latent space.
arXiv Detail & Related papers (2025-10-01T18:00:56Z)
Post-Hoc Split-Point Self-Consistency Verification for Efficient, Unified Quantification of Aleatoric and Epistemic Uncertainty in Deep Learning [5.996056764788456]
Uncertainty quantification (UQ) is vital for trustworthy deep learning, yet existing methods are either computationally intensive or provide only partial, task-specific estimates.<n>We propose a post-hoc single-forward-pass framework that jointly captures aleatoric and epistemic uncertainty without modifying or retraining pretrained models.<n>Our method applies emphSplit-Point Analysis (SPA) to decompose predictive residuals into upper and lower subsets, computing emphMean Absolute Residuals (MARs) on each side.
arXiv Detail & Related papers (2025-09-16T17:16:01Z)
Importance Corrected Neural JKO Sampling [0.0]
We combine continuous normalizing flows (CNFs) with rejection-resampling steps based on importance weights.<n>The arising model can be trained iteratively, reduces the reverse Kullback-Leibler (KL) loss function in each step and allows to generate iid samples.<n> Numerical examples show that our method yields accurate results on various test distributions including high-dimensional multimodal targets.
arXiv Detail & Related papers (2024-07-29T22:49:59Z)
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.<n>We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z)
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference. OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters. Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z)
AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.